Faq how RMSEC and RMSECV related to R2Y and Q2Y seen other software: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
No edit summary
(2 intermediate revisions by one other user not shown)
Line 7: Line 7:
In some software, the values "R2Y" and "Q2Y" are reported for regression models. The R2Y value is equivalent to the y-block cumulative variance captured (as reported in the 5th column of the variance captured table or the .detail.ssq field of a model).  
In some software, the values "R2Y" and "Q2Y" are reported for regression models. The R2Y value is equivalent to the y-block cumulative variance captured (as reported in the 5th column of the variance captured table or the .detail.ssq field of a model).  


<math>R2Y = 1-\frac{RMSEC^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left ( y_{i}-\overline{y} \right )^{2}}</math>


The "Q2Y" value is analogous to R2Y except it is based on the cross-validated results. It is related to the RMSECV values according to this equation :  
The "Q2Y" value is analogous to R2Y except it is based on the cross-validated results. It is related to the RMSECV values according to this equation :  


<math>Q2Y = 1-\frac{RMSECV^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left ( y_{i}-\overline{y} \right )^{2}}</math>


where RMSECV is the root mean square error of cross-validation, m is the number of samples and yi is the actual (aka measured) y-value for sample #i. These relations are only true if the y-block is mean-centered before the model is built.  
where RMSECV is the root mean square error of cross-validation, m is the number of samples and yi is the actual (aka measured) y-value for sample #i. These relations are only true if the y-block is mean-centered before the model is built.  
Line 26: Line 27:


R2Y and Q2Y generally increase towards 1 as a model's fit improves whereas RMSEC and RMSECV decrease to zero
R2Y and Q2Y generally increase towards 1 as a model's fit improves whereas RMSEC and RMSECV decrease to zero
Note that it is possible for R2Y or Q2Y to become negative if the predicted or CV predicted y-values are particularly bad.


RMSEC/CV are in units of the original y-block and can be interpreted as "error levels" (They are very similar to standard deviations) whereas R2Y and Q2Y are in fractional units
RMSEC/CV are in units of the original y-block and can be interpreted as "error levels" (They are very similar to standard deviations) whereas R2Y and Q2Y are in fractional units
Line 37: Line 40:
- an eigenvalue plot from the Analysis window
- an eigenvalue plot from the Analysis window


It is possible for Q2Y to exceed the 0 &rarr; 1 limit if the predicted y-values are particularly bad.)





Revision as of 13:40, 19 May 2020

Issue:

How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?

Possible Solutions:

In some software, the values "R2Y" and "Q2Y" are reported for regression models. The R2Y value is equivalent to the y-block cumulative variance captured (as reported in the 5th column of the variance captured table or the .detail.ssq field of a model).

The "Q2Y" value is analogous to R2Y except it is based on the cross-validated results. It is related to the RMSECV values according to this equation :

where RMSECV is the root mean square error of cross-validation, m is the number of samples and yi is the actual (aka measured) y-value for sample #i. These relations are only true if the y-block is mean-centered before the model is built.

R2Y and Q2Y represent fractions of variance captured while the cumulative variance captured table and .detail.ssq field represent percentages. They are identical except for a factor of 100 difference between fraction and percentage.

Given a PLS model named "m" which used only mean centering or autoscaling on the univariate y-block, the following code calculates R2Y and Q2Y:

>> incl = m.detail.include{1,2};
>> y    = m.detail.data{2}.data(incl,:);
>> my   = length(incl);
>> R2Y = (1-(m.rmsec.^2)*my./sum(mncn(y).^2))
>> Q2Y = (1-(m.rmsecv.^2)*my./sum(mncn(y).^2))

The practical aspects of these statistics are:

R2Y and Q2Y generally increase towards 1 as a model's fit improves whereas RMSEC and RMSECV decrease to zero

Note that it is possible for R2Y or Q2Y to become negative if the predicted or CV predicted y-values are particularly bad.

RMSEC/CV are in units of the original y-block and can be interpreted as "error levels" (They are very similar to standard deviations) whereas R2Y and Q2Y are in fractional units

These values are available from:

- the Matlab command line

>> m.r2y
>> m.q2y

- an eigenvalue plot from the Analysis window


Still having problems? Please contact our helpdesk at helpdesk@eigenvector.com