Classification, Clustering and Training/Test Set Selection using PCA

Principal Component Analysis, PCA, plays a central role for the image processing in Evince. For images, PCA will reduce a hyperspectral image cube with a large number of channels/bands into a smaller data cube with a much smaller number of channels. The number of new channels are equal to the number of calculated of calculated principal components. By default, three principal components are added to a PCA model created from image data. More components can be added from the data tree menu for models.

PCA provides the following functionality:
- Data reduction (compression)
- Spectral/chemical analysis of loadings
- Image Classification and segmentation
- Identification of clusters
- Training/test set selection

A typical PCA workflow can be the following for image data. It is assumed that a hyperspectral data has been imported and a PCA model has been created.

 

1. Analysis of loadings, P, can be done by dragging the model onto the plot area and then select "Loading line". This will create a "Loading Line Plot" from the loadings of the calculated model. The loading line plot will give valuable chemical (and physical) information about the calculated components. The loadings will also give a hint when higher PCA components contain mostly noise. These components may add very little to the model and can be discarded.
2. Dragging the score matrix, T, to the plot area can both create a Contour 2D/RGB image plot or a Scatter Density 2D plot. These plots are analyzed together. The Scatter 2D density plot reveals clusters of pixels/observations with similar spectral properties. Selecting pixels in this plot will select the corresponding pixels in the Contour 2D/RGB image plot and vice versa. Use the freehand selection tool found in the plot toolbar and make a selection in the plot.
3. The created Contour 2D plot will show a representation of the original image where each pixel is colored according its score value in the plotted component. The selection made in the Scatter 2D density plot is here shown in red. It is possible to change the color of the selection in the plot properties of the plot menu.
4. The RGB image is a representation of the original image where each pixel is colored according to three score values at the same time. Using the various selection tools in the plot toolbar, this plot can be used for excluding observations from the training set or to set observations. For example, use the rectangle selection tool for selecting pixels in the plot. Right-click on the plot and choose exclude from the plot menu. When deselected, the excluded pixels will be greyed out.
5. Select "Apply changes" from the plot menu. The RGB image will be updated without the excluded pixels.

6. Alternatively, the selected pixels in the previous step could have been set to a test set. Right-click on the "Train/test" column of the DataSet table. The test set can be used for validation when for example classifying the image content with PLS-DA.