Faq units for RMSEC and RMSECV for PLSDA
What are the units used for RMSEC and RMSECV when cross-validating PLSDA models? Why do the cross-validation curves look strange for PLSDA?
Note: this behavior has changed as of version 5.0 of PLS_Toolbox and Solo. RMSECV is now reported using the standard RMSECV equation (see the documentation). The misclassification rate discussed below is now reported separately.
With PLSDA, cross-validation reports the RMSEC and RMSECV in terms of "fractional misclassification error rate". That is, an RMSECV of 0.05 indicates a 5% misclassification error rate. This misclassification is based on the automatically determined threshold (see FAQ on how this threshold is determined) and the values predicted for each sample when it was in the test set (left out of the model) during cross-validation.
The resulting RMSEC and RMSECV curves (as a function of number of latent variables) may be different from typical regression cross-validation results. This is because, depending on the noise structure in the data, the misclassification error rate may be less sensitive to number of components than a normal RMSECV might be. You may see the RMSECV drop to zero and stay there as the number of components increases. In these instances, the latter latent variables do not have much of an effect on which side of the threshold individual samples fall on. Thus, the misclassification rate is not adversely effected.
As the number and diversity of samples increases, the sensitivity of RMSECV to the number of latent variables increases. To increase sensitivity, you may wish to use fewer splits in your cross-validation.