On-site identification of Ophiocordyceps sinensis using multispectral imaging and chemometrics

For the reasonable and effective collection of Ophiocordyceps sinensis, a new method of on-site identification was attempted using a portable multispectral imaging (MSI) technique. Three dimensional (3D) data-cubes of representative Ophiocordyceps sinensis and weeds samples were acquired and pre-processed with standard normal variate transformation (SNV). Principal component analysis (PCA) and simulated annealing particle swarm optimisation (SAPSO) algorithms were used to extract characteristic images and develop the support vector classification (SVC) models. Results show that the fused feature model of SAPSO-SVC has the best performance, resulting in a recognition accuracy of the prediction set of 96.30%. Moreover, on-site distribution map of Ophiocordyceps sinensis and weeds was created using the spectral feature model of SAPSO-SVC, and the target could be easily identified from the distribution map. This work demonstrates the potential for on-site identification of Ophiocordyceps sinensis in the Qinghai–Tibet Plateau using a portable MSI technique combined with the SAPSO-SVC algorithm.


Introduction 
As a valuable wildlife resource with the dual-purpose of medicine and nutrition, Ophiocordyceps sinensis has strict requirements for its parasitism and growth environment, leading to its short supply in the high-end market. However, because of its small size, dark colour and wide distribution, the traditional manual search method has the disadvantages of high labour intensity and low efficiency in the process of collecting Ophiocordyceps sinensis. Therefore, developing a rapid and on-site identification technology is of major importance for the ecologically friendly and efficient excavation of Ophiocordyceps sinensis resources and to ensure sustainable use.
In recent, digital imaging technologies have been widely used in the classification of farm products, monitoring of crop diseases and insect pests, and other aspects of agricultural production based on the external physical features (e.g. colour, marbling and texture) without considering spectral fingerprints [1][2][3][4] . Similarly, many near-infrared spectroscopic studies have been reported to analyze the internal chemical or biological properties based on a small rounded region of the target sample [5][6][7] . As a fusion of imaging and spectroscopy [8][9][10][11][12] , multispectral imaging (MSI) integrates both techniques in one configuration and can provide both spatial and spectral information for each pixel over the required wavelength range, thereby facilitating the fast and accurate identification of Ophiocordyceps sinensis.
Because of the heavy data-cube of the acquired original multispectral images, there may exist non-informative waveband images. Moreover, the unique external texture features of Ophiocordyceps sinensis are mainly influenced by its internal functional composition. Herein, principal component analysis (PCA) [13] and simulated annealing particle swarm optimisation (SAPSO) [14] were used to extract the characteristic images from two aspects of image and spectral analysis, and a more robust classification model may be developed to accurately identify Ophiocordyceps sinensis using the simplified three dimensional data-cube [15] .
In this work, MSI was applied to fast identify and on-site visualize Ophiocordyceps sinensis. The specific objectives were to (1) extract and pre-process the spectra and images in regions of interest (ROIs) of Ophiocordyceps sinensis and representative weeds; (2) extract the characteristic images using PCA and SAPSO and obtain the texture features of characteristic images using Gray-level Co-occurrence Matrix (GLCM); (3) develop support vector classification (SVC) models and create an on-site classification distribution map.

Sample collection and preparation
A total of 106 representative samples, including 24 Ophiocordyceps sinensis and 82 weeds from 11 categories, were collected from the main area of Ophiocordyceps sinensis production at 4200-4600 m above sea level in Nyingchi, Tibet Autonomous Region, China.

Multispectral image acquisition and processing
The developed MSI configuration consisted of an imaging spectrometer (ImSpector, V10E-QE, Spectral Imaging Ltd., Finland) in the spectral range of 465-630 nm with a digital camera (CMV2000, Imec, Belgium), and computer installed with spectral data-cube acquisition software.
To maximize the information of Ophiocordyceps sinensis that could be obtained under suitable light conditions, the lens orientation was adjusted to approximately 30° downward in the horizontal direction. The distance between the lens and sample was adjusted to approximately 15 cm to ensure an appropriate image size. The system was then operated with a 3 ms exposure time to generate the multispectral image.

Regions of interest
The Ophiocordyceps sinensis stroma mainly consists of the pommel, fertile trunk and an infertile tip. Because of the more evident texture features and larger area weight of the fertile part, it was extracted as the ROIs of Ophiocordyceps sinensis with a size of 25 × 25 pixels. Similarly, the ROIs of different weed samples were determined by their characteristic areas. The averaged spectrum for each ROI was regarded as the representative spectrum of the Ophiocordyceps sinensis or weed sample.

Data process and analysis
Based on the acquired spectra of Ophiocordyceps sinensis and representative weed samples, the dataset was initially divided into the calibration set and prediction set using the concentration gradient method. Because of the influence of the uneven thickness of the tested samples, the spectral noise of light scattering needed to be removed using the preprocessing method of standard normal variate transformation (SNV). The characteristic images were then extracted from the denoised dataset using SAPSO and PCA, and its external texture features were analysed based on the GLCM method. SVC models were finally developed using the characteristic information, and it was also used to establish the On-site classification distribution map of Ophiocordyceps sinensis [16] . A high recognition accuracy of the prediction set signifies a robust prediction performance.

Extraction of characteristic images
Prior to the extraction of image features, resampling was implemented to obtain the three dimensional data-cube for the ROIs. PCA was applied to this data-cube for image dimensional reduction, and PC 1 had a variance contribution rate of more than 80%. Additionally, it was observed that the larger weights in the projection coefficient matrix were mainly distributed in the wavelengths of 490.25 nm, 507.29 nm, 569.08 nm and 605.09 nm. These results were expected because two further wavelengths located at approximately 490.25 nm and 507.29 nm were the absorption peaks related to polysaccharide and α-tocopherol [16][17][18][19][20][21][22] .
Similarly, SAPSO was used to extract four feature variables located at the wavelengths of 490.25 nm, 507.29 nm, 545.02 nm and 626.90 nm, as presented in Figure 1. The feature variables at wavelengths 490.25 nm and 507.29 nm were both obtained by PCA and SAPSO.
The results showed that the characteristic images extracted by PCA and SAPSO both contained the main feature information of Ophiocordyceps sinensis.

Texture feature extraction of characteristic images
Based on the acquired characteristic images of Ophiocordyceps sinensis, texture feature parameters of angular second moment (ASM), contrast (CON), correlation (COR), inverse difference moment (IDM) and entropy (ENT) were extracted using GLCM from four directions of 0°, 45°, 90° and 135°, as shown in Figure 2.
In comparison with the other three directions, the dispersion and overall levels of ASM, COR and IDM in the 0° direction were high, while the values of CON and ENT were low. This indicates that the texture of the fertile trunk in the horizontal direction was more regular, fine and clearly visible. Furthermore, in all four directions, the CON values at wavelengths 545.02 nm and 626.90 nm acquired using SAPSO were both larger than those at wavelengths 569.08 nm and 605.09 nm obtained using PCA. This demonstrates that the characteristic images extracted using SAPSO had more distinct grooves.

Model prediction
The SVC models of Ophiocordyceps sinensis and weeds were developed based on the simplified characteristic images. The model results are shown in Table 1.
The characteristic images obtained by PCA were initially used to develop the PCA-SVC models, resulting in recognition accuracies of 88.89%, 85.19% and 92.60%, while the similar characteristic images obtained by SAPSO were then used to develop the SAPSO-SVC models, resulting in recognition accuracies of 92.59%, 92.59% and 96.30%. Results showed that the SAPSO-SVC models had better performances than those of PCA-SVC models.
This was expected since the SAPSO algorithm could extract more texture features and dominant spectra related to polysaccharide, α-tocopherol and xanthine oxidase of Ophiocordyceps sinensis [24,25] . Furthermore, the fused feature model had the best performance in the three SAPSO-SVC models, and it can be applied for the on-site identification of Ophiocordyceps sinensis.

On-site visualization of Ophiocordyceps sinensis and weeds
Because of the incomputability of texture features on each single pixel, the on-site classification distribution map of Ophiocordyceps sinensis was therefore created based on the spectral feature model of SAPSO-SVC. As shown in Figure 3, pixels marked by blue and yellow represent the portions of Ophiocordyceps sinensis and weeds, respectively. The densely distributed blue pixels could automatically generate the rod-shaped images of Ophiocordyceps sinensis stroma, and its precise position could be intuitively identified from the complex background. This indicates that the distribution map could be used for on-site identification of Ophiocordyceps sinensis.
However, there remained a large amount of dispersed blue 'noise points' on the map. This may be because some representative weed samples in the main producing areas of Ophiocordyceps sinensis were neglected because of the complexity of the growth environment.

Conclusions
To avoid the disadvantages of high labour intensity and low efficiency in the process of collecting Ophiocordyceps sinensis, MSI was used to fast identify Ophiocordyceps sinensis from complex background.
A comparison between the models developed by characteristic images indicated that the fused feature model of SAPSO-SVC had the best performance, resulting in a recognition accuracy of 96.30%.
Furthermore, the on-site classification distribution map of Ophiocordyceps sinensis was