Pcr: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Chuck
No edit summary
 
(22 intermediate revisions by 5 users not shown)
Line 8: Line 8:
:pred  = pcr(x,model,''options'')      %applies model to a new X-block
:pred  = pcr(x,model,''options'')      %applies model to a new X-block
:valid = pcr(x,y,model,''options'')    %applies model to a new X-block, with corresponding new Y values
:valid = pcr(x,y,model,''options'')    %applies model to a new X-block, with corresponding new Y values
:pcr %  Launches an Analysis window with PCR as the selected method.
Please note that the recommended way to build and apply a PCR model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]].


===Description===
===Description===


PCR calculates a single principal components regression model using the given number of components '''ncomp''' to predict '''y''' from measurements '''x''', OR applies an existing PCR model to a new set of data '''x'''
PCR calculates a single principal components regression model using the given number of components <tt>ncomp</tt> to predict <tt>y</tt> from measurements <tt>x</tt>, OR applies an existing PCR model to a new set of data <tt>x</tt>


To make predictions, the inputs are '''x''' the new predictor x-block (2-way array class "double" or "dataset"), and '''model''' the PCR model. The output '''pred''' is a structure, similar to '''model''', that contains scores, predictions, etc. for the new data.
To make predictions, the inputs are <tt>x</tt> the new predictor x-block (2-way array class "double" or "dataset"), and <tt>model</tt> the PCR model. The output <tt>pred</tt> is a structure, similar to <tt>model</tt>, that contains scores, predictions, etc. for the new data.


If new y-block measurements are also available for the new data, then the inputs are '''x''' the new x-block (2-way array class "double" or "dataset"), '''y''' the new y-block (2-way array class "double" or "dataset"), and '''model''' the PCR model to apply. The output '''valid''' is a structure, similar to '''model''', that contains scores, predictions, and additional y-block statistics etc. for the new data.
If new y-block measurements are also available for the new data, then the inputs are <tt>x</tt> the new x-block (2-way array class "double" or "dataset"), <tt>y</tt> the new y-block (2-way array class "double" or "dataset"), and <tt>model</tt> the PCR model to apply. The output <tt>valid</tt> is a structure, similar to <tt>model</tt>, that contains scores, predictions, and additional y-block statistics etc. for the new data.


In prediction and validation modes, the same model structure is used but predictions are provided in the model.detail.pred field.
In prediction and validation modes, the same model structure is used but predictions are provided in the <tt>model.detail.pred</tt> field.


Note: Calling '''pcr''' with no inputs starts the graphical user interface (GUI) for this analysis method.
Note: Calling '''pcr''' with no inputs starts the graphical user interface (GUI) for this analysis method.
Line 23: Line 26:
====Inputs====
====Inputs====


* '''x''' = X-block: predictor block (2-way array or DataSet Object)
* '''x''' = X-block data (2-way array or DataSet Object)
* '''y''' = Y-block: predicted block (2-way array or DataSet Object)
* '''y''' = Y-block data (2-way array or DataSet Object)
* '''ncomp''' = number of components to to be calculated (positive integer scalar).
* '''ncomp''' = number of components to to be calculated (positive integer scalar).


Line 33: Line 36:
====Outputs====
====Outputs====


The output is a standard model structure with the following fields (see MODELSTRUCT):
The output is a standard model structure with the following fields (see [[Standard Model Structure]]):


* '''modeltype''': 'PCR',
* '''modeltype''': 'PCR',
Line 42: Line 45:
* '''reg''': regression vector,
* '''reg''': regression vector,
* '''loads''': cell array with model loadings for each mode/dimension,
* '''loads''': cell array with model loadings for each mode/dimension,
* '''pred''': 2 element cell array containing  
* '''pred''': 2 element cell array containing model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array), and the y-block predictions.
** model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array), and
** the y-block predictions.
* '''tsqs''': cell array with T<sup>2</sup> values for each mode,
* '''tsqs''': cell array with T<sup>2</sup> values for each mode,
* '''ssqresiduals''': cell array with sum of squares residuals for each mode,
* '''ssqresiduals''': cell array with sum of squares residuals for each mode,
Line 62: Line 63:
* '''preprocessing''': {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively),
* '''preprocessing''': {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively),


* '''algorithm''': [ {'svd'} | ' robustpcr' | ' correlationpcr' ], governs which algorithm to use. 'svd' is standard algorithm. 'robustpcr' is robust algorithm with automatic outlier detection. 'correlationpcr' is standard PCR with re-ordering of factors in order of y-variance captured.
* '''algorithm''': [ {'svd'} | ' robustpcr' | ' correlationpcr' |  'frpcr' ], governs which algorithm to use.
** 'svd' = standard singular value decomposition algorithm.  
** 'robustpcr' = robust algorithm with automatic outlier detection.  
** 'correlationpcr' = standard PCR with re-ordering of factors in order of y-variance captured.
** 'frpcr' = full-ratio PCR (a.k.a. optimized scaling) with automatic sample scale correction. Note that with FRPCR, models generally perform better without mean-centering on the x-block.


* '''blockdetails''': ['compact' | {'standard'} | 'all'], extent of predictions and raw residuals included in model. 'standard' = only y-block, 'all' x and y blocks.
* '''blockdetails''': [ 'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
:* ‘Standard’ = the predictions and raw residuals for the X-block as well as the X-block itself are not stored in the model to reduce its size in memory. Specifically, these fields in the model object are left empty: 'model.pred{1}', 'model.detail.res{1}', 'model.detail.data{1}'.
:* ‘Compact’ = for this function, 'compact' is identical to 'standard'.
:* 'All' = keep predictions, raw residuals for both X- & Y-blocks as well as the X- & Y-blocks themselves.


* '''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits. A value of zero (0) disables calculation of confidence limits,
* '''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits. A value of zero (0) disables calculation of confidence limits,


* '''roptions''': structure of options to pass to rpcr (robust PCR engine from the Libra Toolbox). Only used when algorithm is 'robustpcr',
* '''roptions''': structure of options to pass to '''rpcr''' (robust PCR engine from the Libra Toolbox). Only used when algorithm is 'robustpcr',


*  '''alpha''' :  [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpcr'.
*  '''alpha''' :  [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpcr'.


*  '''intadjust''' :  [ {0} ], if equal to one, the intercept adjustment for the LTS-regression will be calculated. See ltsregres.m for details (Libra Toolbox).
*  '''intadjust''' :  [ {0} ], if equal to one, the intercept adjustment for the LTS-regression will be calculated. See '''ltsregres''' for details (Libra Toolbox).


The default options can be retreived using: options = pcr('options');.
The default options can be retreived using: options = pcr('options');.


OUTPUTVERSION
====OUTPUTVERSION====


By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
Line 92: Line 100:
* '''p''' = x-block loadings.
* '''p''' = x-block loadings.


Note: The regression matrices are ordered in b such that each ''Ny'' (number of y-block variables) rows correspond to the regression matrix for that particular number of principal components.
Note: The regression matrices are ordered in '''b''' such that each ''Ny'' (number of y-block variables) rows correspond to the regression matrix for that particular number of principal components.


===See Also===
===See Also===


[[analysis]], [[crossval]], [[frpcr]], [[modelstruct]], [[pca]], [[pls]], [[preprocess]], [[analysis]], [[ridge]]
[[analysis]], [[crossval]], [[frpcr]], [[mlr]], [[modelstruct]], [[pca]], [[pls]], [[preprocess]], [[ridge]], [[EVRIModel_Objects]]

Latest revision as of 14:54, 6 February 2020

Purpose

Principal Components Regression: multivariate inverse least squares regression.

Synopsis

model = pcr(x,y,ncomp,options) %identifies model (calibration step)
pred = pcr(x,model,options) %applies model to a new X-block
valid = pcr(x,y,model,options) %applies model to a new X-block, with corresponding new Y values
pcr % Launches an Analysis window with PCR as the selected method.

Please note that the recommended way to build and apply a PCR model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

PCR calculates a single principal components regression model using the given number of components ncomp to predict y from measurements x, OR applies an existing PCR model to a new set of data x

To make predictions, the inputs are x the new predictor x-block (2-way array class "double" or "dataset"), and model the PCR model. The output pred is a structure, similar to model, that contains scores, predictions, etc. for the new data.

If new y-block measurements are also available for the new data, then the inputs are x the new x-block (2-way array class "double" or "dataset"), y the new y-block (2-way array class "double" or "dataset"), and model the PCR model to apply. The output valid is a structure, similar to model, that contains scores, predictions, and additional y-block statistics etc. for the new data.

In prediction and validation modes, the same model structure is used but predictions are provided in the model.detail.pred field.

Note: Calling pcr with no inputs starts the graphical user interface (GUI) for this analysis method.

Inputs

  • x = X-block data (2-way array or DataSet Object)
  • y = Y-block data (2-way array or DataSet Object)
  • ncomp = number of components to to be calculated (positive integer scalar).

Optional Inputs

  • options discussed below

Outputs

The output is a standard model structure with the following fields (see Standard Model Structure):

  • modeltype: 'PCR',
  • datasource: structure array with information about input data,
  • date: date of creation,
  • time: time of creation,
  • info: additional model information,
  • reg: regression vector,
  • loads: cell array with model loadings for each mode/dimension,
  • pred: 2 element cell array containing model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array), and the y-block predictions.
  • tsqs: cell array with T2 values for each mode,
  • ssqresiduals: cell array with sum of squares residuals for each mode,
  • description: cell array with text description of model, and
  • detail: sub-structure with additional model details and results.

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], governs level of display to command window,
  • plots: [ 'none' | {'final'} ], governs level of plotting,
  • outputversion: [ 2 | {3} ], governs output format (discussed below),
  • preprocessing: {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively),
  • algorithm: [ {'svd'} | ' robustpcr' | ' correlationpcr' | 'frpcr' ], governs which algorithm to use.
    • 'svd' = standard singular value decomposition algorithm.
    • 'robustpcr' = robust algorithm with automatic outlier detection.
    • 'correlationpcr' = standard PCR with re-ordering of factors in order of y-variance captured.
    • 'frpcr' = full-ratio PCR (a.k.a. optimized scaling) with automatic sample scale correction. Note that with FRPCR, models generally perform better without mean-centering on the x-block.
  • blockdetails: [ 'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
  • ‘Standard’ = the predictions and raw residuals for the X-block as well as the X-block itself are not stored in the model to reduce its size in memory. Specifically, these fields in the model object are left empty: 'model.pred{1}', 'model.detail.res{1}', 'model.detail.data{1}'.
  • ‘Compact’ = for this function, 'compact' is identical to 'standard'.
  • 'All' = keep predictions, raw residuals for both X- & Y-blocks as well as the X- & Y-blocks themselves.
  • confidencelimit: [ {'0.95'} ], confidence level for Q and T2 limits. A value of zero (0) disables calculation of confidence limits,
  • roptions: structure of options to pass to rpcr (robust PCR engine from the Libra Toolbox). Only used when algorithm is 'robustpcr',
  • alpha : [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpcr'.
  • intadjust : [ {0} ], if equal to one, the intercept adjustment for the LTS-regression will be calculated. See ltsregres for details (Libra Toolbox).

The default options can be retreived using: options = pcr('options');.

OUTPUTVERSION

By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:

[b,ssq,t,p] = pcr(x,y,ncomp,options)

where the outputs are

  • b = matrix of regression vectors or matrices for each number of principal components up to ncomp,
  • ssq = the sum of squares information,
  • t = x-block scores, and
  • p = x-block loadings.

Note: The regression matrices are ordered in b such that each Ny (number of y-block variables) rows correspond to the regression matrix for that particular number of principal components.

See Also

analysis, crossval, frpcr, mlr, modelstruct, pca, pls, preprocess, ridge, EVRIModel_Objects