Cross-validation

Cross-validation, CV, is a technique used for determining how well a model predicts unknown samples and can be used for the determination of the optimal model complexity. By default, Evince uses cross-validation when creating a PCA or PLS model. How cross-validation is applied can be chosen in the model settings.The cross-validation procedure implies that certain parts of the training set is excluded and then predicted by the model that is calculated for the remaining objects. The cross-validation is divided into a number of rounds, which objects are divided into.The cross-validation procedure is repeated until all objects in its respective rounds have been excluded once from the DataSet. The Predicted Residual Error sum of squares, PRESS, can then be formed as the sum of the squared differences between the actual values and the estimated values of the left out elements.

Cross-validation can be partial or full. Partial cross-validation is component-wise, implying that a component has been withdrawn before calculating the next cross-validation for all rounds. Full cross-validation implies that a full model with all components is computed for every cross-validation round. Please note that only partial cross-validation is available for PCA.

For PCA, Q2X (cross-validated explained variance in X) can be formed as:

Q2X = (SSX-PRESS)/SSX = 1 - PRESS/SSX

The PCA model has no prediction efficiency if the ratio of PRESS/SSX exceeds unity. This happens if PRESS is of a greater value than SSX, which also means that the variance of the prediction error is greater than the actual variance of the original data.

For PLS, Q2Y (cross-validated explained variance in Y) can be formed as:

(SSY-PRESS)/SSY = 1 - PRESS/SSY

The cumulative cross-validation variances, Q2X_cum and Q2Y_cum can be derived from all model components as a measure of total cross-validation variance.

See model settings for information about specific model settings and cross-validation.