Lwr: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Created page with ''''WARNING:''' Placeholder page for LWR function ===Purpose=== Partial least squares regression for univariate or multivariate y-block. ===Synopsis=== :model = pls(x,y,ncomp,…')
 
No edit summary
 
(34 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''WARNING:''' Placeholder page for LWR function
===Purpose===
===Purpose===


Partial least squares regression for univariate or multivariate y-block.
LWR locally weighted regression for univariate Y.


===Synopsis===
===Synopsis===


:model = pls(x,y,ncomp,''options'')       %identifies model (calibration step)
:model = lwr(x,y,ncomp,''npts'',''options'')%identifies model (calibration step)
:pred  = pls(x,model,''options'')         %makes predictions with a new X-block
:pred  = lwr(x,model,''options'');    %makes predictions with a new X-block
:valid = pls(x,y,model,''options'')       %makes predictions with new X- & Y-block
:valid = lwr(x,y,model,''options'')%makes predictions with new X- & Y-block
:lwr % Launches an Analysis window with LWR as the selected method.
 
Please note that the recommended way to build and apply a LWR model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]].


===Description===
===Description===


PLS calculates a single partial least squares regression model using the given number of components <tt>ncomp</tt> to predict a dependent variable <tt>y</tt> from a set of independent variables <tt>x</tt>.
LWR calculates a single locally weighted regression model using the given number of principal components <tt>ncomp</tt> to predict a dependent variable <tt>y</tt> from a set of independent variables <tt>x</tt>.
 
LWR models are useful for performing predictions when the dependent variable, <tt>y</tt>, has a non-linear relationship with the measured independent variables, <tt>x</tt>. Because such responses can often be approximated by a linear function on a small (local) scale, LWR models work by choosing a subset of the calibration data (the "local" calibration samples) to create a "local" model for a given new sample. The local calibration samples are identified as the samples closest to a new sample in the score space of a PCA model (the "selector model".), using the Mahalanobis distance measure. Models are defined using the number principal components used for the selector model (<tt>ncomp</tt>), and the number of points (samples) selected as local (<tt>npts</tt>).
 
Once the samples are selected, one of three algorithms are used to calculate the local model:
:* '''globalpcr''' = the scores from the PCA selector model (for the selected samples) are used to calculate a PCR model. This model is more stable when there are fewer samples being selected, but may not perform as well with high degrees of non-linearity.
:* '''pcr''' / '''pls''' = the raw data of the selected samples are used to create a weighted PCR or PLS model. These models are more adaptable to highly varying non-linearity but may also be less stable when fewer samples are being selected.
 
The LWR function can be used in 'predicton mode' to apply a previously built LWR model, <tt>model</tt>, to a new set of data in <tt>x</tt>, in order to generate y-values for these data.  


Alternatively, PLS can be used in 'predicton mode' to apply a previously built PLS model in <tt>model</tt> to an external set of test data in <tt>x</tt> (2-way array class "double" or "dataset"), in order to generate y-values for these data.  
Furthermore, if matching x-block and y-block measurements are available for an external test set, then LWR can be used in 'validation mode' to predict the y-values of the test data from the model <tt>model</tt> and <tt>x</tt>, and allow comparison of these predicted y-values to the known y-values <tt>y</tt>.


Furthermore, if matching x-block and y-block measurements are available for an external test set, then PLS can be used in 'validation mode' to predict the y-values of the test data from the model <tt>model</tt> and <tt>x</tt>, and allow comparison of these predicted y-values to the known y-values <tt>y</tt>.
For more information on the basic LWR algorithm, see <tt>T. Naes,  T. Isaksson, B. Kowalski, Anal Chem 62 (1990) 664-673.</tt>
For details on the use of y distance when selecting nearest points (option alpha), see <tt>Z. Wang, T. Isaksson, B. R. Kowalski, (1994). Anal Chem 66 (1994) 249–260.</tt>


Note: Calling pls with no inputs starts the graphical user interface (GUI) for this analysis method.  
Note: Calling lwr with no inputs starts the graphical user interface (GUI) for this analysis method. There is a
[[Image:Movie.png|link=http://www.eigenvector.com/eigenguide.php?m=Nonlinear_methods_3]]
[http://www.eigenvector.com/eigenguide.php?m=Nonlinear_methods_3 video using the LWR interface] on the Eigenvector Research web page.


====Inputs====
====Inputs====


* '''x''' = the independent variable (X-block) data (2-way array class "double" or class "datadet")
* '''x''' = X-block (predictor block) class "double" or "dataset"
* '''y''' = the dependent variable (Y-block) data (2-way array class "double" or class "datadet")
* '''y''' = Y-block (predicted block) class "double" or "dataset"
* '''ncomp''' = the number of components to to be calculated (positive integer scalar)
* '''ncomp''' = the number of latent variables to be calculated (positive integer scalar)
* '''npts''' = the number of points to use in local regression (positive integer scalar)
* '''model''' = previously generated lwr model


====Outputs====
====Outputs====


* '''model''' = a standard model structure model with the following fields (see MODELSTRUCT):
* '''model''' = a standard model structure model (see [[Standard Model Structure]])
** '''modeltype''': 'PLS',
** '''datasource''': structure array with information about input data,
** '''date''': date of creation,
** '''time''': time of creation,
** '''info''': additional model information,
** '''reg''': regression vector,
** '''loads''': cell array with model loadings for each mode/dimension,
** '''pred''': 2 element cell array with
*** model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array),and
*** the y-block predictions.
** '''wts''': double array with X-block weights,
** '''tsqs''': cell array with T<sup>2</sup> values for each mode,
** '''ssqresiduals''': cell array with sum of squares residuals for each mode,
** '''description''': cell array with text description of model, and
** '''detail''': sub-structure with additional model details and results.
 
* '''pred''' a structure, similar to '''model''', that contains scores, predictions, etc. for the new data.
* '''pred''' a structure, similar to '''model''', that contains scores, predictions, etc. for the new data.
* '''valid''' a structure, similar to '''model''', that contains scores, predictions, and additional y-block statistics, etc. for the new data.
* '''valid''' a structure, similar to '''model''', that contains scores, predictions, and additional y-block statistics, etc. for the new data.


Line 56: Line 53:
* '''display''': [ 'off' | {'on'} ], governs level of display to command window,
* '''display''': [ 'off' | {'on'} ], governs level of display to command window,
* '''plots''' [ 'none' | {'final'} ], governs level of plotting,
* '''plots''' [ 'none' | {'final'} ], governs level of plotting,
* '''outputversion''': [ 2 | {3} ], governs output format (see below),
* '''waitbar''': [ 'off' |{'auto'}| 'on' ] governs use of waitbar during analysis. 'auto' shows waitbar if delay will likely be longer than a reasonable waiting period.
* '''preprocessing''': {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
* '''preprocessing''': {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
* '''algorithm''': [ 'nip' | {'sim'} | 'robustpls' ], PLS algorithm to use: NIPALS or SIMPLS {default}, and
* '''algorithm''': [ 'globalpcr' | {'pcr'} | 'pls' ] LWR algorithm to use. Method of regression after samples are selected. 'globalpcr' performs PCR based on the PCs calculated from the entire calibration data set but a regression vector calculated from only the selected samples. 'pcr' and 'pls' calculate a local PCR or PLS model based only on the selected samples.
* '''blockdetails''': [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
* '''blockdetails''': [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
*'''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
*'''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
*'''weights''': [ 'hist' | [vector] ], governs sample weighting. If set to the string 'hist', y-block histogram weighting is done on the samples. If set to a vector, each element is used as a weight for the corresponding sample. If empty, no sample weighting is done.
* '''reglvs''': [] Used only when algorithm is 'pcr' or 'pls', this is the number of latent variables/principal components to use in the local regression model, if different from the number selected in the SSQ Table. The number of components selected in the SSQ table is used to generate the global PCA model which is used to select the local calibration samples. [] (Empty) implies LWRPRED should use the same number of latent variables in the local regression as were used in the global PCA model. NOTE: This option is NOT used when algorithm is 'globalpcr'.
* '''roptions''': structure of options to pass to rsimpls (robust PLS engine from the Libra Toolbox).
* '''iter''': [{5}] Iterations in determining local points. Used only when alpha > 0 (i.e. when using y-distance scaling).
: '''alpha''': [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpls'.
* '''alpha''': [ {0} ], has value in range [0-1]. Weighting of y-distances in selection of local points. 0 = do not consider y-distances {default}, 1 = consider ONLY y-distances. With any positive alpha, the algorithm will tend to select samples which are close in both the PC space but which also have similar y-values. This is accomplished by repeating the prediction multiple times. In the first iteration, the selection of samples is done only on the PC space. Subsequent iterations take into account the comparison between predicted y-value of the new sample and the measured y-values of the calibration samples.
 
The default options can be retreived using: options = lwr('options');.
The default options can be retreived using: options = pls('options');.
 
OUTPUTVERSION
 
By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
 
:[b,ssq,p,q,w,t,u,bin] = pls(x,y,ncomp,''options'')
 
where the outputs are
 
* '''b''' = matrix of regression vectors or matrices for each number of principal components up to ncomp,
* '''ssq''' = the sum of squares information,
* '''p''' = x-block loadings,
* '''q''' = y-block loadings,
* '''w''' = x-block weights,
* '''t''' = x-block scores
* '''u''' = y-block scores, and
* '''bin''' = inner relation coefficients.
 
Note: The regression matrices are ordered in b such that each ''Ny'' (number of y-block variables) rows correspond to the regression matrix for that particular number of principal components.
 
===Algorithm===
 
Note that unlike previous versions of the PLS function, the default algorithm (see Options, above) is the faster SIMPLS algorithm. If the alternate NIPALS algorithm is to be used, the options.algorithm field should be set to 'nip'.


===See Also===
===See Also===


[[analysis]], [[crossval]], [[modelstruct]], [[nippls]], [[pcr]], [[plsda]], [[preprocess]], [[ridge]], [[simpls]]
[[analysis]], [[ann]], [[lwrpred]], [[modelstruct]], [[pls]], [[pcr]], [[preprocess]], [[svm]], [[EVRIModel_Objects]]

Latest revision as of 15:10, 6 February 2020

Purpose

LWR locally weighted regression for univariate Y.

Synopsis

model = lwr(x,y,ncomp,npts,options); %identifies model (calibration step)
pred = lwr(x,model,options); %makes predictions with a new X-block
valid = lwr(x,y,model,options); %makes predictions with new X- & Y-block
lwr % Launches an Analysis window with LWR as the selected method.

Please note that the recommended way to build and apply a LWR model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

LWR calculates a single locally weighted regression model using the given number of principal components ncomp to predict a dependent variable y from a set of independent variables x.

LWR models are useful for performing predictions when the dependent variable, y, has a non-linear relationship with the measured independent variables, x. Because such responses can often be approximated by a linear function on a small (local) scale, LWR models work by choosing a subset of the calibration data (the "local" calibration samples) to create a "local" model for a given new sample. The local calibration samples are identified as the samples closest to a new sample in the score space of a PCA model (the "selector model".), using the Mahalanobis distance measure. Models are defined using the number principal components used for the selector model (ncomp), and the number of points (samples) selected as local (npts).

Once the samples are selected, one of three algorithms are used to calculate the local model:

  • globalpcr = the scores from the PCA selector model (for the selected samples) are used to calculate a PCR model. This model is more stable when there are fewer samples being selected, but may not perform as well with high degrees of non-linearity.
  • pcr / pls = the raw data of the selected samples are used to create a weighted PCR or PLS model. These models are more adaptable to highly varying non-linearity but may also be less stable when fewer samples are being selected.

The LWR function can be used in 'predicton mode' to apply a previously built LWR model, model, to a new set of data in x, in order to generate y-values for these data.

Furthermore, if matching x-block and y-block measurements are available for an external test set, then LWR can be used in 'validation mode' to predict the y-values of the test data from the model model and x, and allow comparison of these predicted y-values to the known y-values y.

For more information on the basic LWR algorithm, see T. Naes, T. Isaksson, B. Kowalski, Anal Chem 62 (1990) 664-673. For details on the use of y distance when selecting nearest points (option alpha), see Z. Wang, T. Isaksson, B. R. Kowalski, (1994). Anal Chem 66 (1994) 249–260.

Note: Calling lwr with no inputs starts the graphical user interface (GUI) for this analysis method. There is a Movie.png video using the LWR interface on the Eigenvector Research web page.

Inputs

  • x = X-block (predictor block) class "double" or "dataset"
  • y = Y-block (predicted block) class "double" or "dataset"
  • ncomp = the number of latent variables to be calculated (positive integer scalar)
  • npts = the number of points to use in local regression (positive integer scalar)
  • model = previously generated lwr model

Outputs

  • model = a standard model structure model (see Standard Model Structure)
  • pred a structure, similar to model, that contains scores, predictions, etc. for the new data.
  • valid a structure, similar to model, that contains scores, predictions, and additional y-block statistics, etc. for the new data.

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], governs level of display to command window,
  • plots [ 'none' | {'final'} ], governs level of plotting,
  • waitbar: [ 'off' |{'auto'}| 'on' ] governs use of waitbar during analysis. 'auto' shows waitbar if delay will likely be longer than a reasonable waiting period.
  • preprocessing: {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
  • algorithm: [ 'globalpcr' | {'pcr'} | 'pls' ] LWR algorithm to use. Method of regression after samples are selected. 'globalpcr' performs PCR based on the PCs calculated from the entire calibration data set but a regression vector calculated from only the selected samples. 'pcr' and 'pls' calculate a local PCR or PLS model based only on the selected samples.
  • blockdetails: [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
  • confidencelimit: [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
  • reglvs: [] Used only when algorithm is 'pcr' or 'pls', this is the number of latent variables/principal components to use in the local regression model, if different from the number selected in the SSQ Table. The number of components selected in the SSQ table is used to generate the global PCA model which is used to select the local calibration samples. [] (Empty) implies LWRPRED should use the same number of latent variables in the local regression as were used in the global PCA model. NOTE: This option is NOT used when algorithm is 'globalpcr'.
  • iter: [{5}] Iterations in determining local points. Used only when alpha > 0 (i.e. when using y-distance scaling).
  • alpha: [ {0} ], has value in range [0-1]. Weighting of y-distances in selection of local points. 0 = do not consider y-distances {default}, 1 = consider ONLY y-distances. With any positive alpha, the algorithm will tend to select samples which are close in both the PC space but which also have similar y-values. This is accomplished by repeating the prediction multiple times. In the first iteration, the selection of samples is done only on the PC space. Subsequent iterations take into account the comparison between predicted y-value of the new sample and the measured y-values of the calibration samples.

The default options can be retreived using: options = lwr('options');.

See Also

analysis, ann, lwrpred, modelstruct, pls, pcr, preprocess, svm, EVRIModel_Objects