Evrishapley

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Calculate a variable's contribution using Shapley Values.

Synopsis

results = evrishapley(calx,expx,model,options)
results = evrishapley(x,model,options)

Description

Shapley Values are a variable importance and explanation tool in the AI community. Shapley Values provide individual variable contributions to a model's predictors. This a model-agnostic algorithm, any model's predictions can be explained including Ann, Svm, and Xgb. Important variables deemed by Shapley Values are good candidates for variable selection.

See the shapleygui page to calculate Shapley Values in an interface, examples, and interpretation.

See Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.

Notes

Calculating Shapley Values can be a very expensive task. There are ways to reduce its cost:

  • 1) Group variables based on interval width or dependency.
  • 2) Provide less explanation samples.

Model_Exporter is utilized to speed up the process of generating predictions on the perturbed samples if the user has it installed.

Inputs

Standard input is:

  • calx = double or dataset used to calibrate the model,
  • expx = double or dataset used to generate Shapley Values (samples whose predictions will be explained),
  • model = EVRIModel,
  • options = options structure for evrishapley.

Outputs

The output is a results structure with the following fields:

  • shap: The Shapley Values for all samples in expx and all predictors from the model.
  • baseprediction: The average predictions from the model on the calibration data for each predictor.
  • model: The calibrated model.
  • explainpred: The predictions on expdata.
  • calx: The calibration data.
  • x: The explanation data.
  • shapoptions: Options structure for evrishapley.

Options

  • options = options structure containing the fields:
  • int_width: [ {10} ] The window size of variables to group for the Shapley Value calculation. Grouping highly correlated variables can provide a better explanation as well as significantly speed up the algorithm.
  • nbatches: [{'auto'} double] Number of batches to piecemeal computation. When set to 'auto', n_batches is computed to preserve memory.
  • n_iter: [{'auto'} double] Number of perturbed samples to create per iteration. When set to 'auto', this will be the number of (variables * 2) + 1. Increasing this gives a more faithful representation of the contributions but can lock up memory.
  • random_state: [{1}] Random seed number. Set this to a number for reproducibility.

See Also

selectvars, genalg, ipls, plotloads, pls, plsda, sratio, rpls, Sample and Variable Selection, Variable Selection