# Faq how are ROC curves calculated for PLSDA

### Issue:

How are the ROC curves calculated for PLSDA?

### Possible Solutions:

The ROC curves are based on the predicted y-values for each of your samples. These values are not discrete zeros and ones but range from around zero to around one in a continuous manner (take a look at a plot of y predicted to see what I mean). Each point in an ROC curve (or pair of points at a given threshold value in the "threshold" plots on the right hand side of the ROC figure) comes from calculating the sensitivity and specificity for a given threshold value. Specificity is calculated as the fraction of "not-in-class" samples which are below the given threshold. Sensitivity is calculated as the fraction of "in-class" samples which are above the given threshold.

These are empirical curves in that they are calculated from the data directly and not from a model of the distribution of the data so there will be some "stepping". In fact, with smaller sample sizes, the curves may **NEVER** be smooth because sensitivity and specificity only change (up or down) when the threshold moves past a sample's predicted y-value. For example, if the number of "not-in-class" samples above a threshold of 0.46 is no different than the number above 0.45, these two thresholds technically give the same specificity. As of version 3.5.4 of PLS_Toolbox, we actually calculate only "critical" thresholds (those that actually make a difference in the sensitivity and specificity curves) and interpolate between them. Even then, a multi-modal distribution of y-predictions for either in- or out-of-class samples will lead to non-smooth curves.

The cross-validated versions of the curves are determined by using the same procedure outlined above except that we use the y-value predicted for each sample when it was left out of the calibration set (during cross-validation). One might assume that doing multiple replicate cross-validation subsets would lead to smoother cross-validation curves. Two things keep this from happening:

First, before version 4.0 of PLS_Toolbox, the software doesn't actually average the predicted y-values from multiple replicates. It only remembers the predicted y-value from the LAST time a given sample was left out.

Secondly, even if the above ''issue'' weren't there, the curves would only get smoother if the different sets of samples left out during each cross-validation replicates induced a significant change in the model, and thus the predicted y-value for a sample. If the different models calculated in each cycle are essentially the same, there will be little to no variation in the predicted y-value and the curves will appear very similar for all replicates. In fact, significant variation in the predicted y-value from one sub-set to the next is an indication that the cross-validation is unstable (e.g., outliers in the data, too little data, or "critical" good samples which, when left out, keep a good model from being calculated).

**Still having problems? Please contact our helpdesk at helpdesk@eigenvector.com**