Predictions

When a model is calculated and at least one observation is set to belong to the test set, the Data Tree container holding the prediction statistics, colored in orange, can be accessed.

PCA prediction statistics:

- Xte: The residual X matrix of the test set.

- Tpred: Predicted scores for the test sets. It can sometimes be useful to plot the predicted scores together with the original scores, T, from the model. Tpred can in these cases be added as a second layer as shown in the plot below. (For instructions on how to add layers, see the Plot Layers section.)

- ObsDMXpred: Predicted observation distance to the model in X. Large values indicate bad fit in X for the corresponding observations.

PLS prediction statistics:

- ObsDMYpred: Predicted observation distance to the model in Y. Large values indicate bad fit in Y for the corresponding observations.

- RMSEP_var: The root mean squared error of prediction for each Y-variable. Each RMSEP value is calculated as the square root of the mean of the squared prediction errors (PRESS). The prediction errors are equal to the difference between true and predicted values. The RMSEP is a measure of prediction accuracy and has the same unit as the corresponding Y-variable.

where N is the number of observations in the test set. A RMSEP value for each y-variable is obtained for every PLS component calculated.

- Bias: The average difference between observed and predicted values. A bias value for each y-variable is obtained for every PLS calculated component.

- Sep_var: Bias-corrected prediction error. Sep_var is equal to RMSEP corrected for bias. A Sep_var value for each y-variable is obtained for every PLS component calculated.

- Ypred: Predicted Y values. Predicted values are obtained for all Y-variables and for components. By dragging Ypred onto the Plot Area and choosing the "Y vs Ycalc plot", an observed vs predicted plot is obtained (shown below). This plot uses two layers, one for the calculated values of the training set, and one for the predicted values of the test set. An optimal prediction is achieved when the observed vs predicted values forms a straight diagonal line with a slope of one unit.