Faq how RMSEC and RMSECV related to R2Y and Q2Y seen other software: Difference between revisions
imported>Lyle (Created page with "===Issue:=== How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software? ===Possible Solutions:=== In some software, the values "R2Y" and "Q2Y" are reported fo...") |
|||
(5 intermediate revisions by 4 users not shown) | |||
Line 7: | Line 7: | ||
In some software, the values "R2Y" and "Q2Y" are reported for regression models. The R2Y value is equivalent to the y-block cumulative variance captured (as reported in the 5th column of the variance captured table or the .detail.ssq field of a model). | In some software, the values "R2Y" and "Q2Y" are reported for regression models. The R2Y value is equivalent to the y-block cumulative variance captured (as reported in the 5th column of the variance captured table or the .detail.ssq field of a model). | ||
<math>R2Y = 1-\frac{RMSEC^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left ( y_{i}-\overline{y} \right )^{2}}</math> | |||
The "Q2Y" value is analogous to R2Y except it is based on the cross-validated results. It is related to the RMSECV values according to this equation : | The "Q2Y" value is analogous to R2Y except it is based on the cross-validated results. It is related to the RMSECV values according to this equation : | ||
<math>Q2Y = 1-\frac{RMSECV^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left ( y_{i}-\overline{y} \right )^{2}}</math> | |||
where RMSECV is the root mean square error of cross-validation, m is the number of samples and | where RMSECV is the root mean square error of cross-validation, m is the number of samples and y<sub>i</sub> is the actual (aka measured) y-value for sample ''i''. These relations are only true if the y-block is mean-centered before the model is built. | ||
R2Y and Q2Y represent fractions of variance captured while the cumulative variance captured table and .detail.ssq field represent percentages. They are identical except for a factor of 100 difference between fraction and percentage. | R2Y and Q2Y represent fractions of variance captured while the cumulative variance captured table and <code>.detail.ssq</code> field represent percentages. They are identical except for a factor of 100 difference between fraction and percentage. | ||
Given a PLS model named "m" which used only mean centering or autoscaling on the univariate y-block, the following code calculates R2Y and Q2Y: | |||
>> incl = m.detail.include{1,2}; | |||
>> y = m.detail.data{2}.data(incl,:); | |||
>> my = length(incl); | |||
>> R2Y = (1-(m.rmsec.^2)*my./sum(mncn(y).^2)) | |||
>> Q2Y = (1-(m.rmsecv.^2)*my./sum(mncn(y).^2)) | |||
The practical aspects of these statistics are: | The practical aspects of these statistics are: | ||
R2Y and Q2Y generally increase towards 1 as a model's fit improves whereas RMSEC and RMSECV decrease to zero | R2Y and Q2Y generally increase towards 1 as a model's fit improves whereas RMSEC and RMSECV decrease to zero | ||
Note that it is possible for R2Y or Q2Y to become negative if the predicted or CV predicted y-values are particularly bad. | |||
RMSEC/CV are in units of the original y-block and can be interpreted as "error levels" (They are very similar to standard deviations) whereas R2Y and Q2Y are in fractional units | RMSEC/CV are in units of the original y-block and can be interpreted as "error levels" (They are very similar to standard deviations) whereas R2Y and Q2Y are in fractional units | ||
These values are available from: | |||
- the Matlab command line | |||
>> m.r2y | |||
>> m.q2y | |||
- an eigenvalue plot from the Analysis window | |||
'''Still having problems? Please contact our helpdesk at [mailto:helpdesk@eigenvector.com helpdesk@eigenvector.com]''' | |||
[[Category:FAQ]] | [[Category:FAQ]] |
Latest revision as of 17:07, 15 June 2022
Issue:
How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?
Possible Solutions:
In some software, the values "R2Y" and "Q2Y" are reported for regression models. The R2Y value is equivalent to the y-block cumulative variance captured (as reported in the 5th column of the variance captured table or the .detail.ssq field of a model).
The "Q2Y" value is analogous to R2Y except it is based on the cross-validated results. It is related to the RMSECV values according to this equation :
where RMSECV is the root mean square error of cross-validation, m is the number of samples and yi is the actual (aka measured) y-value for sample i. These relations are only true if the y-block is mean-centered before the model is built.
R2Y and Q2Y represent fractions of variance captured while the cumulative variance captured table and .detail.ssq
field represent percentages. They are identical except for a factor of 100 difference between fraction and percentage.
Given a PLS model named "m" which used only mean centering or autoscaling on the univariate y-block, the following code calculates R2Y and Q2Y:
>> incl = m.detail.include{1,2}; >> y = m.detail.data{2}.data(incl,:); >> my = length(incl); >> R2Y = (1-(m.rmsec.^2)*my./sum(mncn(y).^2)) >> Q2Y = (1-(m.rmsecv.^2)*my./sum(mncn(y).^2))
The practical aspects of these statistics are:
R2Y and Q2Y generally increase towards 1 as a model's fit improves whereas RMSEC and RMSECV decrease to zero
Note that it is possible for R2Y or Q2Y to become negative if the predicted or CV predicted y-values are particularly bad.
RMSEC/CV are in units of the original y-block and can be interpreted as "error levels" (They are very similar to standard deviations) whereas R2Y and Q2Y are in fractional units
These values are available from:
- the Matlab command line
>> m.r2y >> m.q2y
- an eigenvalue plot from the Analysis window
Still having problems? Please contact our helpdesk at helpdesk@eigenvector.com