Duplex

From Eigenvector Research Documentation Wiki
Revision as of 07:52, 20 November 2023 by Lyle (talk | contribs) (Created page with "===Purpose=== Select a subset of samples from a data set by the Duplex algorithm. ===Synopsis=== :[selCal, selTest] = duplex(x, k) ===Description=== Selected samples shou...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Purpose

Select a subset of samples from a data set by the Duplex algorithm.

Synopsis

[selCal, selTest] = duplex(x, k)

Description

Selected samples should provide uniform coverage of the dataset and nclude samples on the boundary of the data set. Duplex starts by selecting the two samples furthest from each other and assigns these to the calibration set. Then finds the next two samples furthest from each other assigns these to the test set. Then iterates over the rest of the samples.

References: R.D. Snee, Validation of regression models: methods and examples, Technometrics 19 (1977) 415-428 M. Daszykowski, B. Walczak, D.L. Massart, Representative subset selection, Analytica Chimica Acta 468 (2002) 91-103

Inputs

  • x = array, or dataset, containing data to select k samples from,
  • k = number of samples to select.

Outputs

  • selCal = logical vector of length nsamples, indicating samples which are selected for calibration set (true = selected). If input x was a dataset object then sel has size (1, nincluded) where nincluded is the number of included samples, and sel indicates which included samples are selected.
  • selTest = (1,nsamples) logical vector indicating samples which are selected for test set, true = is selected. If input x was a dataset then sel has size (1, nincluded) and sel indicates which included samples are selected.

Example

>> load arch;
>> [selCal,selTest] = duplex(arch, 50);
>> arch_subset = arch(selCal,:);

See Also

distslct, reducennsamples, splitcaltest, doptimal, stdsslct, randomsplit, spxy