Permutetest

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Permutation testing for regression and classification models.

Synopsis

results = permutetest(x,y,rm,cvi,ncomp,options)

Description

Performs permutation test where the y-block is shuffled allowing the calculation of probability that the results obtained with the unperturbed y-block are significant or not (as compared to random chance).

In addition to storing all Root Mean Square Error of Calibration (RMSEC) and cross-validation (RMSECV), the self-predicted and cross-validated residuals of each permutation are compared to the original residuals using the following tests:

Wilcoxon test
Sign test
Randomized t-test

T-test These tests give probability of similarity of the two sets of residuals. Thus, a low probability indicates the perturbed results are significantly different from the original model and, thus, the original model is significant. Note that many of these can provide valid results even with very few iterations. However, more iterations improve results and also permit better plots.

When requested the final plot of fractional y-block information captured by calibration and cross-validation versus y-correlation is shown. For more information, see permuteplot.

Inputs

  • x = X-block data to be tested (DataSet or Double)
  • y = Y-block data to be tested (DataSet, Double or logical)
  • rm = regression method as defined in crossval
  • cvi = Cell array defining data split method and total permutation iterations to be performed: {'method' splits iterations}. Note, for split method con or rnd it is best to leave the third cvi iterations parameter = 1 to avoid slow performance as the permutation test is repeated iteration times. Use the npermutation option alone to control the number of permutations used in those cases.
  • ncomp = Maximum number of latent variables to be tested

Optional Inputs

  • model = existing PCA model, onto which new data x is to be applied.
  • options = discussed below.

Outputs

  • results: A structure with the following fields. Sizes are defined using lvs = number of latent variables, iter = number of iterations performed, and ny = number of y-block columns. All fields are size = (lvs iterations ny) unless stated otherwise.
  • rmsecvperm: RMSECV for each permuted y-block.
  • rmsecperm: RMSEC for each permuted y-block.
  • rmsecv: RMSECV for the original unpermuted y-block.
Size = [lvs 1 ny]
  • rmsec: RMSEC for the original unpermuted y-block.
Size = [lvs 1 ny]
  • cvprob: Probabilities calculated for cross-validated residuals. Sub-fields indicate method (defined above).
Sizes all = [lvs ny]
  • cprob: Probabilities calculated for self-predicted residuals. Sub-fields indicate method (defined above).
Sizes all = [lvs ny]
  • ycor: Correlation of each original y-block column (rows here) with each permuted y-block (columns).
  • rmsy: Root Mean Square of each y-block column.
  • y: The original unpermuted y-block.

Options

options = Options structure with one or more of the fields defined in crossval. (See crossval for details on the options). In addition, the following fields are defined for special use in this function:

  • plotlvs = [ 1 ] Model size (in latent variables) to show in final display table and plots. The results for the model with the corresponding number of latent variables will be shown.
  • npermutation = [ {100} ] Number of permutations to perform.
  • permutation = [ {'no'} | 'yes ] Flag to control whether to use 'npermutation' value or default to 100. It is also passed to and used by crossval as its 'permutation' option value.

See Also

crossval, permuteplot, permuteprobs