Applying a Model Quick Start: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
No edit summary
imported>Jeremy
No edit summary
Line 2: Line 2:
|-
|-
|width="40%" valign="top" colspan="2" |
|width="40%" valign="top" colspan="2" |
[[Image:apply_model.013.png|right|500px](Click to Enlarge)]Congratulations!  You have collected calibration data and gone through the exercise of building a model that meets your objectives.  Now, you want to exert one of the most stringent tests - applying your model to new data.  If you have just completed the model building process, all that needs to be done is to load some new data as validation data.  Another scenario is that you have a model that has been built awhile ago, and you wish to apply it to some new data.   
[[Image:apply_model.013.png|right|500px|(Click to Enlarge)]]Congratulations!  You have collected calibration data and gone through the exercise of building a model that meets your objectives.  Now, you want to exert one of the most stringent tests - applying your model to new data.  If you have just completed the model building process, all that needs to be done is to load some new data as validation data.  Another scenario is that you have a model that has been built awhile ago, and you wish to apply it to some new data.   


In this example, there are three variables in the workspace
In this example, there are three variables in the workspace

Revision as of 08:24, 16 March 2009

(Click to Enlarge)
Congratulations! You have collected calibration data and gone through the exercise of building a model that meets your objectives. Now, you want to exert one of the most stringent tests - applying your model to new data. If you have just completed the model building process, all that needs to be done is to load some new data as validation data. Another scenario is that you have a model that has been built awhile ago, and you wish to apply it to some new data.

In this example, there are three variables in the workspace

  • mymodel - a PLS model that has been built on spectral data to predict a concentration
  • spec2 - a new set of spectral data to be used to validate the model
  • conc - concentration data for the validation spectra

The concentration data contains values for five separate components. The model mymodel predicts only one of these concentration values. First, click on the icon for mymodel in the Workspace browser and drag it into the Analysis GUI (or double-click it to open a new Analysis GUI). You will see the SSQ table populated with values, indicating that the model has been loaded. If the model cache was activated during the course of building the model and remains so, the calibration data will also be loaded. You can see this by noting that the X and Y buttons appear depressed, and when you pass the mouse cursor over either information on the respective data blocks is revealed.


Now that the model has been loaded

  • click and drag the icon for spec2 in the Workspace browser into the Analysis GUI; you will be queried on how you want this data loaded - choose "Validation X"
  • click and drag the icon for conc in the Workspace browser into the Analysis GUI; you will be queried on how you want this data loaded - choose "Validation Y"
  • click on the "Apply Model" button under the Analysis Flowchart


Apply model.014.png

When you click on the "Review Scores" button, a multiplot figure will open. Try double-clicking on each subplot to create separate figures. One useful plot is Q residuals versus Hotelling's T2, with both the validation and calibration data visible. In the Plot Controls window, select "Hotelling T^2" for the x-axis, and "Q Residuals" for the y-axis. Make sure that the "Show Cal Data with Test" box is checked toward the bottom of the Plot Controls window. Finally, select "View" under the Plot Controls menu, then "Classes", followed by "Cal/Test Samples"; this will apply color/symbol coding for the two classes of samples. It is sometimes useful to use log scales for Q residuals and/or T2; in this example, a log scale is used for the Q residuals. For the second plot, "Y Predicted" is selected for the y-axis along with "Y Measured" for the x-axis. As in the Q residuals vs. T2 plot, the black circles represent the calibration samples and the red triangles denote the validation samples.

We note from these two plots that:

  • the validation samples are markedly different from the calibration samples - there is at least a two order of magnitude difference in Q residuals between the validation and calibration sets, and several of the validation samples have values of T2 that are higher than the 95% confidence limit
  • the predictions for the validation samples are biased to lower values, although the correlation as measured by R2 is still high
  • a suggested step would be to determine what are the factors that contribute to the high values of Q residuals and T2; these are readily obtained by using the Q con and T con buttons on the Plot Controls window
Apply model.015.png