Mdcheck: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Importing text file)
imported>Jeremy
(Importing text file)
Line 1: Line 1:
===Purpose===
===Purpose===
Missing Data Checker and infiller.
Missing Data Checker and infiller.
===Synopsis===
===Synopsis===
:[flag,missmap,infilled] = mdcheck(data,''options'')
:[flag,missmap,infilled] = mdcheck(data,''options'')
:options = mdcheck('options')
:options = mdcheck('options')
===Description===
===Description===
This function checks for missing data and infills it using a PCA model if desired. The input is the data to be checked data as either a double array or a dataset object. Optional input options is a structure containing options for how the function is to run (see below).
This function checks for missing data and infills it using a PCA model if desired. The input is the data to be checked data as either a double array or a dataset object. Optional input options is a structure containing options for how the function is to run (see below).
Outputs are the fraction of missing data flag, a map of the locations of the missing data as an unint8 variable missmap, and the data with the missing values filled in infilled. Depending on the plots option, a plot of the missing data may also be output.
Outputs are the fraction of missing data flag, a map of the locations of the missing data as an unint8 variable missmap, and the data with the missing values filled in infilled. Depending on the plots option, a plot of the missing data may also be output.


===Options===
===Options===
* '''''options''''' = a structure array with the following fields:
* '''''options''''' = a structure array with the following fields:
* '''frac_ssq''':  [{0.95}] desired fraction between 0 and 1 of variance to be captured by the PCA model,
* '''frac_ssq''':  [{0.95}] desired fraction between 0 and 1 of variance to be captured by the PCA model,
* '''max_pcs''': [{5}] maximum number of PCs in the model, if 0, then it uses the mean,
* '''max_pcs''': [{5}] maximum number of PCs in the model, if 0, then it uses the mean,
* '''meancenter''': ['no' | {'yes'}], tells whether to use mean centering in the algorithm,
* '''meancenter''': ['no' | {'yes'}], tells whether to use mean centering in the algorithm,
* '''recalcmean''': ['no' | {'yes'}], recalculate mean center after each cycle of replacement (may improve results for small matricies),
* '''recalcmean''': ['no' | {'yes'}], recalculate mean center after each cycle of replacement (may improve results for small matricies),
* '''display''': [{'off'} | 'on'], governs level of display,
* '''display''': [{'off'} | 'on'], governs level of display,
* '''tolerance''': [{1e-6  100}] convergence criteria, the first element is the minimum change and the second is the maximum number of iterations,
* '''tolerance''': [{1e-6  100}] convergence criteria, the first element is the minimum change and the second is the maximum number of iterations,
* '''max_missing''': [{0.4}] maximum fraction of missing data with which MDCHECK will operate, and
* '''max_missing''': [{0.4}] maximum fraction of missing data with which MDCHECK will operate, and
* '''toomuch''': [{'error'} | 'exclude'] what action should be taken if too much missing data is found. 'error' exit with error message, 'exclude' will exclude elements (rows/columns/slabs/etc) which contain too much missing data from the data before replacement. 'exclude' requires a dataset object as input for (data),
* '''toomuch''': [{'error'} | 'exclude'] what action should be taken if too much missing data is found. 'error' exit with error message, 'exclude' will exclude elements (rows/columns/slabs/etc) which contain too much missing data from the data before replacement. 'exclude' requires a dataset object as input for (data),
* '''''algorithm''''': [ {'svd'} | 'nipals' ] specified the missing data algorithm to use, NIPALS typically used for large amounts of missing data or large multi-way arrays.
* '''''algorithm''''': [ {'svd'} | 'nipals' ] specified the missing data algorithm to use, NIPALS typically used for large amounts of missing data or large multi-way arrays.
Note: MDCHECK captures up to ''options.frac_ssq'' of the variance using ''options.max_pcs'' or fewer PCA components.
Note: MDCHECK captures up to ''options.frac_ssq'' of the variance using ''options.max_pcs'' or fewer PCA components.
The default options can be retreived using: options = mdcheck('options');.
The default options can be retreived using: options = mdcheck('options');.
===See Also===
===See Also===
[[parafac]], [[pca]]
[[parafac]], [[pca]]

Revision as of 15:25, 3 September 2008

Purpose

Missing Data Checker and infiller.

Synopsis

[flag,missmap,infilled] = mdcheck(data,options)
options = mdcheck('options')

Description

This function checks for missing data and infills it using a PCA model if desired. The input is the data to be checked data as either a double array or a dataset object. Optional input options is a structure containing options for how the function is to run (see below).

Outputs are the fraction of missing data flag, a map of the locations of the missing data as an unint8 variable missmap, and the data with the missing values filled in infilled. Depending on the plots option, a plot of the missing data may also be output.


Options

  • options = a structure array with the following fields:
  • frac_ssq: [{0.95}] desired fraction between 0 and 1 of variance to be captured by the PCA model,
  • max_pcs: [{5}] maximum number of PCs in the model, if 0, then it uses the mean,
  • meancenter: ['no' | {'yes'}], tells whether to use mean centering in the algorithm,
  • recalcmean: ['no' | {'yes'}], recalculate mean center after each cycle of replacement (may improve results for small matricies),
  • display: [{'off'} | 'on'], governs level of display,
  • tolerance: [{1e-6 100}] convergence criteria, the first element is the minimum change and the second is the maximum number of iterations,
  • max_missing: [{0.4}] maximum fraction of missing data with which MDCHECK will operate, and
  • toomuch: [{'error'} | 'exclude'] what action should be taken if too much missing data is found. 'error' exit with error message, 'exclude' will exclude elements (rows/columns/slabs/etc) which contain too much missing data from the data before replacement. 'exclude' requires a dataset object as input for (data),
  • algorithm: [ {'svd'} | 'nipals' ] specified the missing data algorithm to use, NIPALS typically used for large amounts of missing data or large multi-way arrays.

Note: MDCHECK captures up to options.frac_ssq of the variance using options.max_pcs or fewer PCA components.

The default options can be retreived using: options = mdcheck('options');.

See Also

parafac, pca