https://www.wiki.eigenvector.com/index.php?title=Faq_why_get_missing_data_warning&feed=atom&action=historyFaq why get missing data warning - Revision history2024-03-29T04:43:10ZRevision history for this page on the wikiMediaWiki 1.39.6https://www.wiki.eigenvector.com/index.php?title=Faq_why_get_missing_data_warning&diff=10151&oldid=previmported>Lyle at 21:05, 8 January 20192019-01-08T21:05:14Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 14:05, 8 January 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l9">Line 9:</td>
<td colspan="2" class="diff-lineno">Line 9:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The implication of the warning is that, to build a model the algorithm requires values for all variables and samples. To handle this problem, PLS_Toolbox uses a data imputation algorithm which looks to replace missing data by estimating a value for the missing data points, building a PCA model of all the data, and then using that model to replace the missing data points again (this is then repeated until the replaced values converge on unchanging values). This procedure is not perfect and can still lead to samples which have high leverage or residuals (i.e. samples that are outliers) but if you have lots of missing data, it may be the only reasonable approach. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The implication of the warning is that, to build a model the algorithm requires values for all variables and samples. To handle this problem, PLS_Toolbox uses a data imputation algorithm which looks to replace missing data by estimating a value for the missing data points, building a PCA model of all the data, and then using that model to replace the missing data points again (this is then repeated until the replaced values converge on unchanging values). This procedure is not perfect and can still lead to samples which have high leverage or residuals (i.e. samples that are outliers) but if you have lots of missing data, it may be the only reasonable approach. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>If data is missing in only a couple of samples, you could exclude those samples, build a model from the remaining data. (You can also later use the PLS_Toolbox <del style="font-weight: bold; text-decoration: none;">"</del>replace<del style="font-weight: bold; text-decoration: none;">" </del>function to estimate the missing values for the excluded samples using that model and then rebuild the model with all data - this may give a better estimate than the PCA imputation method gives.) </div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>If data is missing in only a couple of samples, you could exclude those samples, build a model from the remaining data. (You can also later use the PLS_Toolbox <ins style="font-weight: bold; text-decoration: none;"><code></ins>replace<ins style="font-weight: bold; text-decoration: none;"></code> </ins>function to estimate the missing values for the excluded samples using that model and then rebuild the model with all data - this may give a better estimate than the PCA imputation method gives.) </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>If data is missing from a lot of samples, you don't have any other real option. There are some algorithms which use weighting to ignore missing values. See, for example, the tucker and tld functions. </div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>If data is missing from a lot of samples, you don't have any other real option. There are some algorithms which use weighting to ignore missing values. See, for example, the <ins style="font-weight: bold; text-decoration: none;"><code></ins>tucker<ins style="font-weight: bold; text-decoration: none;"></code> </ins>and <ins style="font-weight: bold; text-decoration: none;"><code></ins>tld<ins style="font-weight: bold; text-decoration: none;"></code> </ins>functions. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
</table>imported>Lylehttps://www.wiki.eigenvector.com/index.php?title=Faq_why_get_missing_data_warning&diff=10150&oldid=previmported>Lyle at 20:16, 5 December 20182018-12-05T20:16:19Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 13:16, 5 December 2018</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l12">Line 12:</td>
<td colspan="2" class="diff-lineno">Line 12:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>If data is missing from a lot of samples, you don't have any other real option. There are some algorithms which use weighting to ignore missing values. See, for example, the tucker and tld functions. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>If data is missing from a lot of samples, you don't have any other real option. There are some algorithms which use weighting to ignore missing values. See, for example, the tucker and tld functions. </div></td></tr>
<tr><td colspan="2" class="diff-side-deleted"></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Still having problems? Please contact our helpdesk at [mailto:helpdesk@eigenvector.com helpdesk@eigenvector.com]'''</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>'''Still having problems? Please contact our helpdesk at [mailto:helpdesk@eigenvector.com helpdesk@eigenvector.com]'''</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:FAQ]]</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[[Category:FAQ]]</div></td></tr>
</table>imported>Lylehttps://www.wiki.eigenvector.com/index.php?title=Faq_why_get_missing_data_warning&diff=10149&oldid=previmported>Lyle: Created page with "===Issue:=== Why do I get the warning/notice "Missing Data Found - Replacing with "best guess" from existing model. Results may be affected by this action." ===Possible Solu..."2018-12-03T19:51:56Z<p>Created page with "===Issue:=== Why do I get the warning/notice "Missing Data Found - Replacing with "best guess" from existing model. Results may be affected by this action." ===Possible Solu..."</p>
<p><b>New page</b></p><div>===Issue:===<br />
<br />
Why do I get the warning/notice "Missing Data Found - Replacing with "best guess" from existing model. Results may be affected by this action."<br />
<br />
===Possible Solutions:===<br />
<br />
The warning comes because you have '''NaN (Not a Number)''' in your data somewhere. NaN is "missing data" - data points you do not have values for. Sometimes this will happen with certain preprocessing, but the most likely cause is that when you imported your data, it had some missing data points. <br />
<br />
The implication of the warning is that, to build a model the algorithm requires values for all variables and samples. To handle this problem, PLS_Toolbox uses a data imputation algorithm which looks to replace missing data by estimating a value for the missing data points, building a PCA model of all the data, and then using that model to replace the missing data points again (this is then repeated until the replaced values converge on unchanging values). This procedure is not perfect and can still lead to samples which have high leverage or residuals (i.e. samples that are outliers) but if you have lots of missing data, it may be the only reasonable approach. <br />
<br />
If data is missing in only a couple of samples, you could exclude those samples, build a model from the remaining data. (You can also later use the PLS_Toolbox "replace" function to estimate the missing values for the excluded samples using that model and then rebuild the model with all data - this may give a better estimate than the PCA imputation method gives.) <br />
<br />
If data is missing from a lot of samples, you don't have any other real option. There are some algorithms which use weighting to ignore missing values. See, for example, the tucker and tld functions. <br />
<br />
'''Still having problems? Please contact our helpdesk at [mailto:helpdesk@eigenvector.com helpdesk@eigenvector.com]'''<br />
<br />
[[Category:FAQ]]</div>imported>Lyle