Svm: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Donal
No edit summary
imported>Donal
Line 53: Line 53:
* '''display''': [ 'off' | {'on'} ], governs level of display to command window,
* '''display''': [ 'off' | {'on'} ], governs level of display to command window,
* '''plots''' [ 'none' | {'final'} ], governs level of plotting,
* '''plots''' [ 'none' | {'final'} ], governs level of plotting,
* '''outputversion''': [ 2 | {3} ], governs output format (see below),
* '''preprocessing''': {[]} preprocessing structures for x block (see PREPROCESS). NOTE that y-block preprocessing is NOT used with SVMs. Any y-preprocessing will be ignored.
* '''preprocessing''': {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
* '''algorithm''': [ 'libsvm' ] algorithm to use. libsvm is default and currently only option.
* '''algorithm''': [ 'nip' | {'sim'} | 'robustpls' ], PLS algorithm to use: NIPALS or SIMPLS {default}, and
* '''kerneltype''': [ 'linear' | {'rbf'} ], SVM kernel to use. 'rbf' is default.
* '''svmtype''': [ {'epsilon-svr'} | 'nu-svr' ] Type of SVM to apply. The default is 'epsilon-svr' for regression.
* '''probabilityestimates''': [ '0' | {'1'} ], whether to train the SVR model for probability estimates, 0 or 1 (default 0)"
 
* '''cvtimelimit''': Set a time limit (seconds) on individual cross-validation sub-calculation when searching over supplied SVM parameter ranges for optimal parameters. Only relevant if parameter ranges are used for SVM parameters such as cost, epsilon, gamma or nu. Default is 10;
* '''splits''': Number of subsets to divide data into when applying n-fold cross validation. Default is 5.
* '''gamma''': Value(s) to use for LIBSVM kernel gamma parameter. Default is 15 values from 10^-6 to 10, spaced uniformly in log.
* '''cost''': Value(s) to use for LIBSVM 'c' parameter. Default is 11 values from 10^-3 to 100, spaced uniformly in log.
* '''epsilon''': Value(s) to use for LIBSVM 'p' parameter (epsilon in loss function). Default is the set of values [1.0, 0.1, 0.01].
* '''nu''': Value(s) to use for LIBSVM 'n' parameter (nu of nu-SVC, and nu-SVR). Default is the set of values [0.2, 0.5, 0.8].
* '''outliernu''': Value to use for nu in LIBSVM's one-class svm outlier detection. (0.05).
* '''blockdetails''': [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
* '''blockdetails''': [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.


===Algorithm===
===Algorithm===

Revision as of 15:44, 25 January 2010

Purpose

SVM Support Vector Machine (LIBSVM) for regression or classification.

Synopsis

model = svm(x,y,options); %identifies model (calibration step)
pred = svm(x,model,options); %makes predictions with a new X-block
pred = svm(x,y,model,options); %performs a "test" call with a new X-block and known y-values

Description

SVM performs calibration and application of Support Vector Machine (SVM) regression models. These are non-linear models which can be used for regression or classification problems. The model consists of a number of support vectors (essentially samples selected from the calibration set) and non-linear model coefficients which define the non-linear mapping of variables in the input x-block to allow prediction of either the continuous y-block variable (for regression problems), or the classification as passed in either the classes of the x-block or in a y-block which contains numerical classes. It is recommended that classification be done through the svmda function.

Svm is implemented using the LIBSVM package which provides both epsilon-support vector regression (epsilon-SVR) and nu-support vector regression (nu-SVR). Linear and Gaussian Radial Basis Function kernel types are supported by this function.

Note: Calling svm with no inputs starts the graphical user interface (GUI) for this analysis method.

Inputs

  • x = X-block (predictor block) class "double" or "dataset",
  • y = Y-block (predicted block) class "double" or "dataset",
  • model = previously generated model (when applying model to new data).

Outputs

  • model = a standard model structure model with the following fields (see MODELSTRUCT):
    • modeltype: 'PLS',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • reg: regression vector,
    • loads: cell array with model loadings for each mode/dimension,
    • pred: 2 element cell array with
      • model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array),and
      • the y-block predictions.
    • wts: double array with X-block weights,
    • tsqs: cell array with T2 values for each mode,
    • ssqresiduals: cell array with sum of squares residuals for each mode,
    • description: cell array with text description of model, and
    • detail: sub-structure with additional model details and results.
  • pred a structure, similar to model, that contains scores, predictions, etc. for the new data.
  • valid a structure, similar to model, that contains scores, predictions, and additional y-block statistics, etc. for the new data.


Options

***TODO*** options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], governs level of display to command window,
  • plots [ 'none' | {'final'} ], governs level of plotting,
  • preprocessing: {[]} preprocessing structures for x block (see PREPROCESS). NOTE that y-block preprocessing is NOT used with SVMs. Any y-preprocessing will be ignored.
  • algorithm: [ 'libsvm' ] algorithm to use. libsvm is default and currently only option.
  • kerneltype: [ 'linear' | {'rbf'} ], SVM kernel to use. 'rbf' is default.
  • svmtype: [ {'epsilon-svr'} | 'nu-svr' ] Type of SVM to apply. The default is 'epsilon-svr' for regression.
  • probabilityestimates: [ '0' | {'1'} ], whether to train the SVR model for probability estimates, 0 or 1 (default 0)"
  • cvtimelimit: Set a time limit (seconds) on individual cross-validation sub-calculation when searching over supplied SVM parameter ranges for optimal parameters. Only relevant if parameter ranges are used for SVM parameters such as cost, epsilon, gamma or nu. Default is 10;
  • splits: Number of subsets to divide data into when applying n-fold cross validation. Default is 5.
  • gamma: Value(s) to use for LIBSVM kernel gamma parameter. Default is 15 values from 10^-6 to 10, spaced uniformly in log.
  • cost: Value(s) to use for LIBSVM 'c' parameter. Default is 11 values from 10^-3 to 100, spaced uniformly in log.
  • epsilon: Value(s) to use for LIBSVM 'p' parameter (epsilon in loss function). Default is the set of values [1.0, 0.1, 0.01].
  • nu: Value(s) to use for LIBSVM 'n' parameter (nu of nu-SVC, and nu-SVR). Default is the set of values [0.2, 0.5, 0.8].
  • outliernu: Value to use for nu in LIBSVM's one-class svm outlier detection. (0.05).
  • blockdetails: [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.

Algorithm

Note that xxx

See Also

analysis, plsda