From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search


XGBoost for classification or regression using the XGBoost package. XGBENGINE is a lower level function. Users are recommended to instead use the functions xgb for regression or xgbda for classification.


model = (x,y,options)
cv = (x,y,options)
pred = (x,y,model,options)


Gradient Boosted Tree Ensemble for classification or regression. xgbTrain uses the XGBoost package to train or apply an XGB model or return cross validation accuracy based on training data.

Cross-validation search for optimal parameter values is triggered by passing ranges for the eta, max_depth, or num_round parameters.

XGBENGINE is implemented using the XGBoost XGBoost package. User-specified values are used for XGBoost parameters (see options above). See XGBoost Parameters for further details of these options.


  • x = X-block (predictor block) class "double".
  • y = Y-block (predicted block) class "double" is a vector of length m indicating sample class or target value.
  • model = XGB (Java) model produced by previous xgbengine training run.


  • model = XGBoost Java model (if not run in cross-validation mode).
  • pred = structure array with predictions
  • valid = structure array with predictions


options = a structure array with the following fields:

  • xgbtype : [ 'xgbr' | 'xgbc' ] Type of XGB to apply.
  • n : [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the XGB model. 'pca' uses a simple PCA model to compress the information. 'pls' uses either a pls or plsda model (depending on the xgbtype). Compression can make the XGB more stable and less prone to overfitting.
  • compressncomp : [ 1 ] Number of latent variables (or principal components to include in the compression model.
  • compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance corrected scores from compression model.
  • compressmd : [ 'no' |{'yes'}] Use Mahalnobis Distance correctedscores from compression model.
  • cvtimelimit : Set a time limit (seconds) on individual cross-validationsub-calculation when searching over supplied XGB parameter ranges for optimal parameters. Only relevant if parameter ranges are used for XGB parameters such as eta, num_round,or max_depth. Default is 10 seconds;A second time limit = 30*cvtimelimit is applied to any xgb calibration calculation which is not part ofcross-validation.
  • cvi : { { 'rnd' 5 } } Standard cross-validation cell (see crossval)defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values.Alternatively, can be a vector with the same number of elements as x has rows with integer values indicating CV subsets (see crossval).
  • eta : [{0.1}] Value(s) to use for XGBoost 'eta' parameter. Eta controls the learning rate of the gradient boosting.Values in range (0,1].
  • max_depth : [{6}] Value(s) to use for XGBoost 'max_depth' parameter. Specifies the maximum depth allowed for the decision trees.
  • num_round : [{500}] Value(s) to use for XGBoost 'num_round' parameter. Specifies how many rounds of tree creation to perform.
  • strictthreshold : [0.5] Probability threshold for assigning a sample to a class. Affects model.classification.inclass.
  • predictionrule : { {'mostprobable'} | 'strict' ] governs which classification prediction statistics appear first in the confusion matrix and confusion table summaries.

See Also

analysis, browse, knn, lwr, pls, plsda, xgb, xgbengine