Analysis Menu

From Eigenvector Research Documentation Wiki
Revision as of 11:43, 25 January 2010 by imported>Jeremy (→‎Quantitative Analysis Methods)
Jump to navigation Jump to search

The Analysis Menu in the Analysis GUI provides access to various analysis methods within the GUI. Some of these methods are "one-block" methods (meaning they operate on only the X data) and others are "two-block" methods (meaning they require both an X and a Y data be loaded). Most methods also create a "model" when they are executed.

In all cases, once the analysis method has been selected and appropriate data loaded, a model can be built by clicking on the Calibrate button in the toolbar (the Gears) Calibrate Button or by clicking on the "Model" button in the Status Panel Model button.png.

For many methods, particular method options can be selected or modified using the Options button: Options Button

The following methods are available in most versions of Solo and PLS_Toolbox (some special versions of the software may have fewer or additional methods). The methods are divided into groups based on their typical application:

Exploratory and Cluster Analysis Methods

These methods are one-block methods and require only the X block to operate. The Y block is not used in these methods.

  • PCA Principal Component Analysis: used for exploratory data analysis and Multivariate Statistical Process Control as well as general pattern recognition and fault detection applications. PLS_Toolbox users, see also: pca
  • Purity: Interactive mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Most useful on data where some samples and/or variables represent "pure" responses or components (non-mixtures). Goal is to provide more physically-interpretable results than PCA. PLS_Toolbox users, see also: purity
  • MCR Multivariate Curve Resolution: Automated mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Uses an algorithm with successive approximations which can take some time to complete and has some ambiguity, but can operate with complicated mixtures of unknown components. PLS_Toolbox users, see also: mcr
  • PARAFAC PARAllel FACtor analysis: Very similar to MCR, but can be applied to multiway data (data with 3 or more dimensions) as well as typical 2-way data. Results for 2-way data are essentially the same as MCR, but results on multiway data can be very deterministic. PLS_Toolbox users, see also: parafac
  • MPCA Multiway Principal Component Analysis: used for exploratory analysis of 3-way batch data in which the first mode is usually time, the second mode is variables, and the third mode is sample (e.g. batch number, or wafer number, in the case of semiconductor field). MPCA identifies trends both between variables, but also changes in variables through time, known as trajectory. Models can be more complicated to interpret than PCA models and may be more sensitive to minor variations, but can provide improved selectivity in some cases. PLS_Toolbox users, see also: mpca
  • Cluster : Performs a variety of unsupervised cluster analysis methods. Used to look for similarities between samples with resutls displayed as a dendrogram with similar samples grouped together and attached by short "branches". A number of similarity metrics are available with different sensitivities. For details, see: cluster

Quantitative Analysis Methods

These methods are used in quantitative problems where one needs to determine the amount of a component, property, or other value based on the measured X-block responses. They are all two-block methods and most require both an X and a Y block to operate.

  • PLS Partial Least Squares: Factor-based regression method using an inverse regression equation. PLS identifies latent variables (factors or patterns) in the X block which can be used to predict the column(s) of the Y block. Inverse methods are often used when not all underlying sources of variation are known and quantified. PLS_Toolbox users, see also: pls
  • PCR Principal Component Regression: Inverse regression method closely related to PLS with similar goals. PCR may be less sensitive to random and systematic error in the Y block but more sensitive to systematic error in the X block. PLS_Toolbox users, see also: pcr
  • MLR Multiple Linear Regression: Non-factor based inverse regression method. MLR uses raw variable responses in X to predict Y. This method requires that all columns of X be unique (not highly correlated) and may be highly unstable or unusable with many variables. Models do not provide quality of fit statistics. PLS_Toolbox users, see also: mlr
  • LWR Locally Weighted Regression: Inverse regression method often used with non-linear systems. LWR uses PCA to automatically select the calibration samples most similar to a new (unknown) sample and calculates a PLS or PCR model from only those samples (this is called the "local" model). User must select the number of samples to include in the local model and whether the model should be a PLS, PCR, or Global PCR model. Global PCR uses the scores from the original PCA. The other two are standard PLS or PCR models which can use the same number of latent variables as the global model or a user-selected number.
  • SVM Support Vector Machine [Regression]: Sometimes called SVR (Support Vector Regression), this is a non-linear regression method which can be considered a hybrid of MLR and LWR. The calibration step selects calibration samples (called "support vectors") which are deemed the most critical to defining the regression relationship. These samples are used with [usually] non-linear weighting terms to define a locally-weighted prediction for new samples. Unlike the other regression methods, SVM is intrinsically non-linear and can better approximate non-linear responses, however, it can also be more chaotic as a result.
  • CLS Classical Least Squares: Factor-based classical regression method based on a simple linearly additive model. CLS works well when all responses in a system are known or can be determined experimentally. Often works well when several underlying sources of variance exist and their quantity needs to be determined. PLS_Toolbox users, see also: cls
Unlike the other methods, CLS can operate on an X block alone. If no Y block is loaded, CLS assumes that the samples in the X block are "pure component responses" (i.e. each row of the X block represents what an individual component of the system looks like when measured on its own.)

Classification Methods

These methods are used to identify a sample as belonging to one or more groups of previously-classified samples. The samples in the calibration set must be assigned to class(es). These class assignments are then used to help identify the class for an unknown sample. They are mostly one-block methods which operate on a single X block with the calibration samples' "class" field assigned to indicate the class membership.

  • KNN K-Nearest Neighbors: A classification method which assigns an unknown sample to a class by identifying the "k" closest samples in the calibration set and tallying a "vote" of the classes of these samples. The class which receives the highest vote count is determined to be the class of the unknown. PLS_Toolbox users, see also: knn
  • PLSDA Partial Least Squares Discriminant Analysis: A classification method which identifies the differences between two or more classes by identifying what is different between the classes. PLSDA is a factor-based method very similar to Linear Discriminant Analysis (LDA) but does not suffer from problems with collinear (highly related) variables. PLS_Toolbox users, see also: plsda
  • SVMDA Support Vector Machine Discriminant Analysis: Sometimes called SVC (Support Vector Classification), this is a non-linear classification method which can be considered a variant of KNN. The calibration step selects calibration samples (called "support vectors") which are deemed the most critical to defining the class boundries. These samples are used with [usually] non-linear weighting terms to define a locally-weighted prediction for new samples. As opposed to PLSDA and KNN, SVM is intrinsically non-linear and can better approximate non-linear responses, but can also be more chaotic as a result.
  • SIMCA Soft Independent Modeling of/by Class Analogy: A classification method in which a PCA model is created for each class in the calibration data. Unknown samples are then projected into each PCA model and classified as in or not in each class based on whether the sample falls "inside" each PCA model. PLS_Toolbox users, see also: simca