Xgbda: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Donal
imported>Donal
Line 43: Line 43:
* '''compressmd''' : [ 'no' |{'yes'}] Use Mahalnobis Distance correctedscores from compression model.
* '''compressmd''' : [ 'no' |{'yes'}] Use Mahalnobis Distance correctedscores from compression model.
* '''cvi''' : { { 'rnd' 5 } } Standard cross-validation cell (see crossval)defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values.Alternatively, can be a vector with the same number of elements as x has rows with integer values indicating CV subsets (see crossval).
* '''cvi''' : { { 'rnd' 5 } } Standard cross-validation cell (see crossval)defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values.Alternatively, can be a vector with the same number of elements as x has rows with integer values indicating CV subsets (see crossval).
* '''eta''' : [{0.1}] Value(s) to use for XGBoost 'eta' parameter. Eta controls the learning rate of the gradient boosting.Values in range (0,1].
* '''eta''' : Value(s) to use for XGBoost 'eta' parameter. Eta controls the learning rate of the gradient boosting.Values in range (0,1]. Using a single value specifies the value to use. Using a range of values specifies the parameters to search over to find the optimal value. Default is 3 values [0.1, 0.3, 0.5].
* '''max_depth''' : [{6}] Value(s) to use for XGBoost 'max_depth' parameter. Specifies the maximum depth allowed for the decision trees.
* '''max_depth''' : Value(s) to use for XGBoost 'max_depth' parameter. Specifies the maximum depth allowed for the decision trees. Using a single value specifies the value to use. Using a range of values specifies the parameters to search over to find the optimal value. Default is 6 values [1 2 3 4 5 6].
* '''num_round''' : [{500}] Value(s) to use for XGBoost 'num_round' parameter. Specifies how many rounds of tree creation to perform.
* '''num_round''' : Value(s) to use for XGBoost 'num_round' parameter. Specifies how many rounds of tree creation to perform. Using a single value specifies the value to use. Using a range of values specifies the parameters to search over to find the optimal value. Default is 3 values [100 300 500].


* '''strictthreshold''' : [0.5] Probability threshold for assigning a sample to a class. Affects model.classification.inclass.
* '''strictthreshold''' : [0.5] Probability threshold for assigning a sample to a class. Affects model.classification.inclass.

Revision as of 11:59, 20 December 2018

Purpose

Gradient Boosted Tree Ensemble for classification (Discriminant Analysis) using XGBoost.

Synopsis

model = xgbda(x,options); %identifies model using classes in x
model = xgbda(x,y,options); %identifies model using y for classes
pred = xgbda(x,model,options); %makes predictions with a new X-block
valid = xgbda(x,y,model,options); %performs a "test" call with a new X-block with known y-classes

Description

XGB performs calibration and application of gradient boosted decision tree models for classification. These are non-linear models which predict the probability of a test sample belonging to each of the modeled classes, hence they predict the class of a test sample.

Inputs

  • x = X-block (predictor block) class "double" or "dataset".
  • y = Y-block (predicted block) class "double" or "dataset". If omitted in a calibration call, the x-block must be a dataset object with classes in the first mode (samples). y can always be omitted in a prediction call (when a model is passed) If y is omitted in a prediction call, x will be checked for classes. If found, these classes will be assumed to be the ones corresponding to the model.
  • model = previously generated model (when applying model to new data)

Outputs

  • model = standard model structure containing the xgboost model (see Standard Model Structure). Feature scores are contained in model.detail.xgb.featurescores.
  • pred = structure array with predictions
  • valid = structure array with predictions

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ] governs level of display to command window.
  • plots [ 'none' | {'final'} ] governs level of plotting.
  • waitbar: [ off | {'on'} ] governs display of waitbar during optimization and predictions.
  • preprocessing: {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
  • algorithm: [ 'xgboost' ] algorithm to use. xgboost is default and currently only option.
  • classset : [ 1 ] indicates which class set in x to use when no y-block is provided.
  • xgbtype : [ 'xgbr' | {'xgbc'} ] Type of XGB to apply. Default is 'xgbc' for classification, and 'xgbr' for regression.
  • compression : [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the XGB model. 'pca' uses a simple PCA model to compress the information. 'pls' uses either a pls or plsda model (depending on the xgbtype). Compression can make the XGB more stable and less prone to overfitting.
  • compressncomp : [ 1 ] Number of latent variables (or principal components to include in the compression model.
  • compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance corrected scores from compression model.
  • compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance correctedscores from compression model.
  • cvi : { { 'rnd' 5 } } Standard cross-validation cell (see crossval)defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values.Alternatively, can be a vector with the same number of elements as x has rows with integer values indicating CV subsets (see crossval).
  • eta : Value(s) to use for XGBoost 'eta' parameter. Eta controls the learning rate of the gradient boosting.Values in range (0,1]. Using a single value specifies the value to use. Using a range of values specifies the parameters to search over to find the optimal value. Default is 3 values [0.1, 0.3, 0.5].
  • max_depth : Value(s) to use for XGBoost 'max_depth' parameter. Specifies the maximum depth allowed for the decision trees. Using a single value specifies the value to use. Using a range of values specifies the parameters to search over to find the optimal value. Default is 6 values [1 2 3 4 5 6].
  • num_round : Value(s) to use for XGBoost 'num_round' parameter. Specifies how many rounds of tree creation to perform. Using a single value specifies the value to use. Using a range of values specifies the parameters to search over to find the optimal value. Default is 3 values [100 300 500].
  • strictthreshold : [0.5] Probability threshold for assigning a sample to a class. Affects model.classification.inclass.
  • predictionrule : { {'mostprobable'} | 'strict' ] governs which classification prediction statistics appear first in the confusion matrix and confusion table summaries.

Algorithm

Xgbda is implemented using the XGBoost package. User-specified values are used for XGBoost parameters (see options above). See XGBoost Parameters for further details of these options.

The default XGBDA parameters eta, max_depth and num_round have value ranges rather than single values. This xgbda function uses a search over the grid of appropriate parameters using cross-validation to select the optimal XGBoost parameter values and builds an XGBDA model using those values. This is the recommended usage. The user can avoid this grid-search by passing in single values for these parameters, however.

See Also

analysis, browse, knn, lwr, pls, plsda, xgb, xgbengine