Xgb

Purpose

Gradient Boosted Tree (XGBoost) for regression or classification.

Synopsis

model = xgb(x,y,options); %identifies model (calibration step)

pred = xgb(x,model,options); %makes predictions with a new X-block

valid = xgb(x,y,model,options); %performs a "test" call with a new X-block and known y-values

Description

To choose between regression and classification, use the xgbtype option:

regression : xgbtype = 'xgbr'

classification : xgbtype = 'xgbc'

It is recommended that classification be done through the xgbda function.

Inputs

x = X-block (predictor block) class "double" or "dataset",
y = Y-block (predicted block) class "double" or "dataset",
model = previously generated model (when applying model to new data)

Outputs

model = standard model structure containing the xgboost model (see Standard Model Structure). Feature scores are contained in model.detail.xgb.featurescores.
pred = structure array with predictions
valid = structure array with predictions

Options

options = a structure array with the following fields:

display: [ 'off' | {'on'} ] governs level of display to command window.
plots [ 'none' | {'final'} ] governs level of plotting.
waitbar: [ off | {'on'} ] governs display of waitbar during optimization and predictions.
preprocessing: {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
algorithm: [ 'xgboost' ] algorithm to use. xgboost is default and currently only option.
classset : [ 1 ] indicates which class set in x to use when no y-block is provided.
xgbtype : [ {'xgbr'} | 'xgbc' ] Type of XGB to apply. Default is 'xgbc' for classification, and 'xgbr' for regression.
compression : [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the XGB model. 'pca' uses a simple PCA model to compress the information. 'pls' uses either a pls or plsda model (depending on the xgbtype). Compression can make the XGB more stable and less prone to overfitting.
compressncomp : [ 1 ] Number of latent variables (or principal components to include in the compression model.
compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance corrected scores from compression model.
compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance correctedscores from compression model.
cvi : { { 'rnd' 5 } } Standard cross-validation cell (see crossval)defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values.Alternatively, can be a vector with the same number of elements as x has rows with integer values indicating CV subsets (see crossval).
eta : [{0.1}] Value(s) to use for XGBoost 'eta' parameter. Eta controls the learning rate of the gradient boosting.Values in range (0,1].
max_depth : [{6}] Value(s) to use for XGBoost 'max_depth' parameter. Specifies the maximum depth allowed for the decision trees.
num_round : [{500}] Value(s) to use for XGBoost 'num_round' parameter. Specifies how many rounds of tree creation to perform.

Algorithm

Xgb is implemented using the XGBoost XGBoost package. User-specified values are used for XGBoost parameters (see options above). See XGBoost Parameters for further details of these options.

The default XGB parameters eta, max_depth and num_round have value ranges rather than single values. This xgb function uses a search over the grid of appropriate parameters using cross-validation to select the optimal XGBoost parameter values and builds an XGB model using those values. This is the recommended usage. The user can avoid this grid-search by passing in single values for these parameters, however.

Xgb

Contents

Purpose

Synopsis

Description

Inputs

Outputs

Options

Algorithm

See Also

Navigation menu

Xgb

Purpose

Synopsis

Description

Inputs

Outputs

Options

Algorithm

See Also

Navigation menu

Search