Model settings

The user enters the model wizard when a PCA, PLS or PLS-DA model is to be created. In the first step the user selects the model type. In the second step, shown below, the user can alter a number of model settings. The user can choose to click "Finish" without altering the settings.

PCA model settings:

- Model name: Enter the name for the created model.

- DataSet: The name of the used Dataset.

- Components: the number of components can be changed from [AUTO] to a specific number. AUTO implies that Evince uses cross-validation for automatic detection of the optimal number of components. Sometimes, e.g. when dealing with very large data sets, the cross-validation procedure is rather time-consuming and especially so if a large number of components need to be calculated. The number of components can in these cases be lowered for improved performance. Cross-validation is disabled by default for image data and three principal components are calculated.

- Category Models: Changing the default setting to one of the Categories will create a model for each class of the selected category.


- Cross-validation, Options for "Exclusion by" combo-box:

None: Disables cross-validation. Turning off the cross-validation is useful for very large DataSets and improves calculation speed. Image data uses this setting by default.

Evenly spread: Excludes objects that are evenly spread in the diagonal direction of the DataSet.

Group: Divides the DataSet in a number of continuous groups in the diagonal direction of the DataSet.

Category: Excludes one class at-a-time for the selected category. This exclusion method could be used when similarity between the objects within each class causes the cross-validation results to become too positive. Excluding one sub-class at-a-time will negate the effect of always having a similar object in the training set. The selection of the number of cross-validation rounds is disabled when exclusion by "class" is the active method.

Random: Randomly excludes objects.

- Cross-validation, Rounds: The user can give the number of cross-validation rounds, which is "7" by default. With an increased number of rounds, the number of excluded objects in each round is decreased.

- Cross-validation, Category: The user must give the name of the Category if "category" is chosen in the "Exclusion by" combo-box.

- Calculate Confidence Intervals: This will use the results from each cross-validation round to calculate confidence intervals (1. Efron B et al.) for the model statistics.

- Template: Chooses a template for automatic plot generation. Two templates are available, "Standard" and "Standard Image". The "Standard" template will generate model collection plots. The "Standard Image" template will generate image model collection plots.

 

PLS model (specific) settings:

- PLS Model type: If the default setting (Multi Y PLS) is chosen, Evince will calculate one PLS model for all included Y variables. If instead the other setting (Single Y PLS) is chosen, Evince will calculate one PLS model for each of the Y variables.

- Cross-validation Type; Partial, Full: Sets the cross-validation to be either "Partial" or "Full". Partial cross-validation is only component-wise, while full cross-validation implies that a full model is calculated for every cross-validation round.

 

PLS-DA model (specific) settings:

- Set Y from: Chooses the Y variables for the PLS-DA model. The user can choose a class or use all selected Y-variables. The selected Y-variables are shown at the bottom of the DataSet when viewing the variables.

- Set equal class size: Sets equal number of observations for all classes. PLS-DA modeling may fail if class sizes are uneven.


References


1. Efron, B., and Gong, G., (1983), A Leisurely Look at the Bootstrap, the Jack-knife, and Cross-validation, American Statistician, 37, 36-48.