Plsda: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Chuck
imported>Chuck
No edit summary
Line 14: Line 14:
PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:
PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:


(A) a column vector of class numbers indicating class assignments:
*(A) a column vector of class numbers indicating class assignments:


     y = [1 1 3 2]';
     y = [1 1 3 2]';
Line 20: Line 20:
'''NOTE:''' if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows.
'''NOTE:''' if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows.


(B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
*(B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
<pre>
<pre>
   y = [1 0 0;
   y = [1 0 0;
Line 30: Line 30:
NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.
NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.


The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, PLSDTHRES). Similarly, a probability of a sample being inside or outside the class can be calculated using DISCRIMPROB. The predicted probability of each class is included in the output model structure in the field:
The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, [[plsdthres]]). Similarly, a probability of a sample being inside or outside the class can be calculated using [[discrimprob]]. The predicted probability of each class is included in the output model structure in the field:


:model.details.predprobability
:model.details.predprobability
Line 36: Line 36:
====Inputs====
====Inputs====


* '''x''' = X-block (predictor block) class "double" or "dataset",
* '''x''' = X-block (predictor block), class "double" or "dataset",
*       '''y''' = Y-block - OPTIONAL if x is a dataset containing classes for sample mode (mode 1) otherwise, y is one of:
* '''y''' = Y-block  
::(A) column vector of sample classes for each sample in x -OPTIONAL if x is a dataset containing classes for sample mode (mode 1)
** OPTIONAL if '''x''' is a dataset containing classes for sample mode (mode 1)
::(B) a logical array with 1 indicating class membership for each sample (rows) in one or more classes (columns)  
** otherwise, y is one of the following:
::or (C) a cell array of class groupings of classes from the x-block data. For example: <tt> {[1 2] [3]} </tt>  would model classes 1 and 2 as a single group against class 3.
***(A) column vector of sample classes for each sample in '''x''' 
*   '''ncomp''' =  the number of latent variables to be calculated (positive integer scalar).
***(B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or
* '''options''' = an optional input options structure (see Options below)
***(C) a cell array of class groupings of classes from the x-block data. For example: <tt> {[1 2] [3]} </tt>  would model classes 1 and 2 as a single group against class 3.
* '''ncomp''' =  the number of latent variables to be calculated (positive integer scalar).
 
====Optional Inputs====
 
* '''options''' = an optional input options structure (see below)


====Outputs====
====Outputs====


* '''model''' =  standard model structure containing the PLSDA model (See MODELSTRUCT).
* '''model''' =  standard model structure containing the PLSDA model (See MODELSTRUCT).
* '''pred''' =  structure array with predictions
* '''valid''' =  structure array with predictions, includes known class information (Y block data) of test samples


*      '''pred''' =  structure array with predictions
Note: Calling '''plsda''' with no inputs starts the graphical user interface (GUI) for this analysis method.
 
*    '''valid''' =  structure array with predictionsz
 
Note: Calling plsda with no inputs starts the graphical user interface (GUI) for this analysis method.


===Options===
===Options===


* '''display''': [ 'off' | {'on'} ]      governs level of display to command window.
* '''options''' = a structure that can contain the following fields:
 
** '''display''': [ 'off' | {'on'} ]      governs level of display to command window.
* '''plots''': [ 'none' | {'final'} ]  governs level of plotting.
** '''plots''': [ 'none' | {'final'} ]  governs level of plotting.
 
** '''preprocessing''': {[] []}  preprocessing structures for x and y blocks (see PREPROCESS).
* '''preprocessing''': {[] []}  preprocessing structures for x and y blocks (see PREPROCESS).
** '''prior''': [ ] Vector of prior probabilities of observing each class. If any class prior is "Inf", the frequency of observation of that class in the calibration is used as its prior probability. If all priors are Inf, this has the effect of providing the fewest incorrect predictions assuming that the probability of observing a given class                    in future samples is similar to the frequency that class in the calibration set. The default [] uses all ones i.e. equal priors.
 
** '''algorithm''': [ 'nip' | {'sim'} ]    PLS algorithm to use: NIPALS or SIMPLS
* '''algorithm''': [ 'nip' | {'sim'} ]    PLS algorithm to use: NIPALS or SIMPLS
** '''blockdetails''': [ 'compact' | {'standard'} | 'all' ]  Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks
 
* '''blockdetails''': [ 'compact' | {'standard'} | 'all' ]  Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks


===See Also===
===See Also===


[[class2logical]], [[crossval]], [[pls]], [[plsdthres]], [[simca]]
[[class2logical]], [[compressmodel]], [[crossval]], [[discrimprob]], [[pls]], [[plsdaroc]], [[plsdthres]], [[simca]]

Revision as of 17:33, 8 October 2008

Purpose

Partial least squares discriminant analysis.

Synopsis

model = plsda(x,y,ncomp,options)
model = plsda(x,ncomp,options)
pred = plsda(x,model,options)
valid = plsda(x,y,model,options)

Description

PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:

  • (A) a column vector of class numbers indicating class assignments:
   y = [1 1 3 2]';

NOTE: if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows.

  • (B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
   y = [1 0 0;
        1 0 0;
        0 0 1;
        0 1 0]

NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.

The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, plsdthres). Similarly, a probability of a sample being inside or outside the class can be calculated using discrimprob. The predicted probability of each class is included in the output model structure in the field:

model.details.predprobability

Inputs

  • x = X-block (predictor block), class "double" or "dataset",
  • y = Y-block
    • OPTIONAL if x is a dataset containing classes for sample mode (mode 1)
    • otherwise, y is one of the following:
      • (A) column vector of sample classes for each sample in x
      • (B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or
      • (C) a cell array of class groupings of classes from the x-block data. For example: {[1 2] [3]} would model classes 1 and 2 as a single group against class 3.
  • ncomp = the number of latent variables to be calculated (positive integer scalar).

Optional Inputs

  • options = an optional input options structure (see below)

Outputs

  • model = standard model structure containing the PLSDA model (See MODELSTRUCT).
  • pred = structure array with predictions
  • valid = structure array with predictions, includes known class information (Y block data) of test samples

Note: Calling plsda with no inputs starts the graphical user interface (GUI) for this analysis method.

Options

  • options = a structure that can contain the following fields:
    • display: [ 'off' | {'on'} ] governs level of display to command window.
    • plots: [ 'none' | {'final'} ] governs level of plotting.
    • preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
    • prior: [ ] Vector of prior probabilities of observing each class. If any class prior is "Inf", the frequency of observation of that class in the calibration is used as its prior probability. If all priors are Inf, this has the effect of providing the fewest incorrect predictions assuming that the probability of observing a given class in future samples is similar to the frequency that class in the calibration set. The default [] uses all ones i.e. equal priors.
    • algorithm: [ 'nip' | {'sim'} ] PLS algorithm to use: NIPALS or SIMPLS
    • blockdetails: [ 'compact' | {'standard'} | 'all' ] Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks

See Also

class2logical, compressmodel, crossval, discrimprob, pls, plsdaroc, plsdthres, simca