IPLS Variable Selection Interface

From Eigenvector Research Documentation Wiki
Revision as of 10:39, 15 November 2022 by Scott (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Stepwise Interval Variable Selection Panel

The Interval Variable Selection panel in the Analysis Window provides access to the interval variable selection functionality.

Variable selection provides a simple way to discard variables which may be adding complexity to a model. Discarding such variables may improve performance of a final model. Variable selection can also be used as an exploratory tool to help identify variables which are the most interesting for a given task. For a detailed description of how interval variable selection works and practical considerations, see Interval PLS for Variable Selection. The following documentation describes the basic use of the controls on the Interval Variable Selection Panel.

Note that in addition to the controls on the panel, the current settings in the preprocessing and cross-validation settings of Analysis are also used by the Interval Variable Selection algorithm.


Iplspanel2.png

MustUse

Indices of variables which must be used in all models.

Mode

(Note: Mode is only visible if the "All Options" checkbox has been checked.) Defines the variable selection direction. "Forward" starts with no variables selected and looks for the best single interval to add. "Reverse" starts with all variables selected and looks for the worst single interval to discard. See the "No. of Intervals" option below to add or remove more than one interval. See the "Interval Size" option to add or remove "blocks" of variables in each interval.

Step Size

(Note: Step Size is only visible if the "All Options" checkbox has been checked.) Specifies the number of variables between the start of each interval. When less than the Interval Size, intervals will "overlap" and variables may belong to more than one interval (this allows a "sliding window" style of variable selection). When greater than the Interval Size, there will be a gap of unused variables between intervals (can be used for quick, course selection of variable windows). The automatic setting will cause there to be no overlap nor gaps between intervals.

Algorithm

Defines regression algorithm to use.

No. of Intervals

Defines the total number of intervals to be added or removed. When a specific number is entered, the algorithm will continue to add or remove intervals up to this number (A value of one will limit the algorithm to selecting only one interval to add or remove, 2 will look for the two best/worst intervals to add/remove, etc.) When "Automatic" is checked, the algorithm will continue to add or remove intervals until the cross-validation results cease to improve.

Interval Size (vars)

Specifies how many adjacent variables should be included in each interval. For example, a value of 5 will cause each interval to contain a set of 5 adjacent variables which must be added or removed as a group. A value of one (the default) will cause each interval to contain a single variable.

Max LVs

See CrossVal Interface, max LVs is set via CrossVal.