Detection of scab in wheat ears using in situ hyperspectral data and support vector machine optimized by genetic algorithm

A new method was proposed to extract sensitive features and to construct a monitoring model for wheat scab based on in situ hyperspectral data of wheat ears to achieve effective prevention and control and provide theoretical support for its large-scale monitoring. Eight sensitive features were selected through correlation analysis and wavelet transform. These features were as follows: three original bands of 350-400 nm, 500-600 nm, and 720-1000 nm; three vegetation indices of modified simple ratio (MSR), normalized difference vegetation index, and structural independent pigment index; and two wavelet features of WF01 and WF02. By combining the selected sensitive features with support vector machine (SVM) and SVM optimized by genetic algorithm (GASVM), a total of 16 monitoring models were built, and the monitoring accuracies of the two types of models were compared. The ability of the monitoring models built by GASVM to identify scab was better than that of SVM algorithm under the same characteristic variables. Among the 16 models, MSR combined with GASVM had an overall accuracy of 75% and a Kappa coefficient of 0.47. GASVM can be used to monitor wheat scab and its application can improve the accuracy of disease monitoring.


Introduction 
Wheat is one of the three major grains in the world, and wheat diseases have become the focus of research worldwide. Scab is a worldwide epidemic disease caused by Fusarium asiaticum and F. graminearum that mainly occurs in warm and humid areas. It decreases wheat yield and produces the deoxynivalenol toxin. This toxin can cause poisoning and even death of humans and animals. If the disease is not monitored and controlled in time, it will cause a large-scale reduction in wheat yield and degradation of grain quality, thereby resulting in economic losses [1] . Therefore, the problem of scab must be addressed.
Traditional monitoring methods, such as on-site sampling analysis, are mainly performed by plant pathologists or experts. Such methods are costly, time consuming, and labor-intensive; they are also unsuitable for large-area applications [2] . Remote sensing has the characteristics of multiscale and multi-time resolution. In recent years, it has been widely used in crop growth and area monitoring and production forecasting [3] . Hyperspectral remote sensing technology is one of the greatest achievements in the development of remote sensing technology in the late 20th century. It is applied by an increasing number of scholars who are involved in crop disease research because of its rich band information. Hyperspectral technology can obtain continuous spectral curves with spectral resolutions up to the order of nanometers, and its ability to recognize features is strong [4] . Many researchers use hyperspectral imaging technology to directly identify the disease of crop kernels. For example, Delwiche and Kim [5,6] identified scab in wheat kernels by using a custom-made hyperspectral imaging system and near-infrared hyperspectral system (1000-1700 nm). Liang et al. [7] used hyperspectral technology to identify infected kernels via spectral analysis and pattern processing. The scab identification model constructed by support vector machine (SVM) and back propagation neural network achieved excellent results and an accuracy of more than 90%. Ewa et al. [8] constructed a classification model based on texture parameters of hyperspectral images to identify infected kernels, and ventral kernels were classified with 100% accuracy. These studies all achieved good results, and they all directly identified the infected or uninfected kernels through hyperspectral technology in the laboratory. However, they excluded the influence of various factors, such as field conditions, weather, and leaves. Thus, they achieved ideal research results. Studies [9] have demonstrated that leaf area index, chlorophyll content, and aboveground biomass are the main indices affecting scab occurrence. Li et al. [10] directly used non-imaging hyperspectral technology to conduct canopy-scale research from the practical perspective of wheat growth conditions. A remote sensing estimation model of scab was established based on spectral reflectance data, climatic factors, and growth parameters, and this model was used to provide a reference for the information acquisition of disease prevention and monitoring for winter wheat production in Yangze and Huai river region. A single wheat ear in the field contains purer spectral information, which can provide a theoretical basis for canopy studies and even large-scale research. Therefore, some researchers directly studied scab on a single wheat ear. Huang et al. [11] directly measured the spectrum of a single wheat ear by using non-imaging hyperspectral technology, established an effective disease severity recognition model by performing Fisher's linear discriminant analysis and SVM based on radial basis function (RBF), and identified scab on a single wheat plant in the front, side, and upright directions. Non-imaging remote sensing technology also has great potential in the monitoring of wheat scab. However, most previous studies focused on wheat powdery mildew, stripe rust, or aphids; few studies were conducted on scab [11] . Therefore, the present study aimed to use the non-imaging hyperspectral technique to identify wheat scab on the ear scale. SVM is a model construction method based on statistical theory. It is typically used in pattern recognition, classification, and regression analysis [12] . It first constructs the optimal hyperplane to minimize the classification error. Then, it transforms the input space into a high-dimensional space by nonlinear transformation of the appropriate kernel function to find the optimal classification surface in the new space [13] . However, effectively selecting the kernel function and determining the parameters when using this algorithm are still controversial. The traditional grid search algorithm is inefficient, computationally intensive, time consuming, and produces an unsatisfactory effect. Genetic algorithm (GA) is good for solving global optimization problems; it has strong robustness and a simple process [14] . GA has been widely used in various studies, such as face and text recognition.
SVM optimized by genetic algorithm (GASVM) has been applied to the monitoring and identification of wheat diseases [14] , but it has not been used for wheat scab. Hence, this study aimed to do the following: (1) analyze the spectral information of wheat ears measured by a non-imaging spectrometer and select the spectral features that are sensitive and significantly different from disease severity; and (2) establish effective models for identifying wheat scab by using SVM and GASVM algorithms and prove that the GASVM algorithm is more conducive to the information detection of scab.

Study site
Scab is sensitive to humidity and temperature, and it often occurs in temperate regions where the climate is warm, moist, and rainy [10] .
Guohe Town (31º 29′N, 117º13′E), Baihu Town (31º14′N, 117º27′E), and Shucheng County (31º32′N, 116º59′E) in Anhui Province were selected as the research sites (Figure 1), and the field spectrum acquisition time was May 2018. The average temperature is 14°C-17°C, and the annual rainfall is 770-1700 mm [15] . This is the proper temperature and the sufficient amount of moisture suitable for scab occurrence. Hyperspectral data were measured during the grain filling stage of wheat, which is an important period for scab detection.

Data acquisition
The statistical results of the wheat ear samples measured in the field are shown in Table 1. The analytical spectral device FieldSpec Pro full range spectrometer (350-2500 nm) was used to collect spectral information. Its spectral resolution was 3 nm within the 350-1000 nm range and 10 nm within the 1000-2500 nm range [11] . All in situ hyperspectral data were measured in a windless, cloudless, and sunny environment from 10:00-14:00.
To eliminate the interference of other wheat ears, we cut a hole in the center of a 1 m×1 m black cloth, inserted the wheat ear vertically into the black cloth, and placed the probe of the spectrometer on top of the ear to measure the spectrum. Each ear was measured 10 times, and a 40 cm×40 cm BaSO 4 calibration panel was used for spectrum correction before each measurement. The average value of the 10 measurements was recorded. All spectral curves were resampled at 1 nm intervals before pretreatment. The spectral reflectance could be obtained by calculating the ratio of radiation brightness between wheat ears and panel radiation. The calculation formula is as follows: where, R 1 is the target spectral reflectance; DN 1 is the gray value of the target spectrum; DN 2 is the gray value of the calibration panel, and R 2 is the calibration panel reflectance.

Assessment of disease severity
According to the rules for monitoring and forecast of the wheat head blight (GB/T15796-2011), disease severity is the proportion of diseased spikelets (ear rot, decay, etc.) to the total number spikelets. The grading standards of wheat scab occurrence severity are shown in Table 2, and the samples of levels 0-4 were 8, 16, 11, 20, and 17, respectively. In order to reduce the difficulty of identifying the severity of the disease, the severity of scab was further reclassified into a healthy class and diseased class. Given that the infected wheat ears in levels 0 and 1 were difficult to distinguish, we classified the level 1 samples as healthy wheat ears, and the level 2-4 samples were divided into infected wheat ears.

Data analysis 2.4.1 Selection of vegetation indices
Vegetation index-based analysis is a major approach for studying and practicing remote sensing of pests and diseases. According to the spectral characteristics of crops under stress, researchers have constructed a variety of vegetation indices for monitoring crop diseases and insect pests [16] . To determine the sensitive features of physiological and biochemical changes induced by scab, we selected 10 vegetation indices ( Table 3) that were combined and transformed by different wavebands as the primary feature sets of the monitoring model and discussed their applicability in assessing scab. These indices were as follows: modified simple ratio (MSR) and normalized difference vegetation index (NDVI) related to biophysical pigments; nitrogen reflectance index (NRI) related to water and nitrogen content; photochemical reflectance index (PRI) and physiological reflectance index (PhRI) related to photosynthetic activity; ratio vegetation stress index (RVSI) related to cell structure; and structural independent pigment index (SIPI), normalized pigment chlorophyll index (NPCI), anthocyanin reflectance index (ARI), and triangular vegetation index (TVI) related to pigment variation.

Wavelet transform
Wavelet transform can realize data filtering and de-noising. It has multiresolution characteristics. Each channel can obtain local detail features of the data by adopting multichannel filtering, which highlights the sensitive information of the data. Therefore, the utilization of spectral information is optimized to some extent [12] .

Vegetation index Formula Reference
Modified Simple Ratio (MSR) (ρ800/ρ670 − 1)/ (ρ800/ρ670 + 1) 1/2 [17,18] Normalized Difference Vegetation Index (NDVI) (ρ840 − ρ675)/(ρ840 + ρ675) [19] Nitrogen Reflectance Index (NRI) (ρ570 − ρ670)/(ρ570 + ρ670) [20] Photochemical Reflectance Index (PRI) (ρ570 − ρ531)/(ρ570 + ρ531) [21] Structural Independent Pigment Index (SIPI) (ρ800 − ρ445)/(ρ800 − ρ680) [22] Physiological Reflectance Index (PhRI) (ρ550 − ρ531)/(ρ550 + ρ531) [23] Normalized Pigment Chlorophyll Index (NPCI) (ρ680 − ρ430)/(ρ680 + ρ430) [24] Anthocyanin Reflectance Index (ARI) (ρ550) -1 − (ρ700) -1 [25] Ratio Vegetation Stress Index (RVSI) [(ρ712 + ρ752)/2] − ρ732 [26] Triangular Vegetation Index (TVI) 60(R750 − R550) − 100(R670 − R550) [27] Gabor wavelet enables simultaneous local analysis of time and frequency, thereby analyzing stationary signals easily. Gabor wavelet transform solves the expansion coefficient of Gabor [28] . In this study, the Gaussian function was adopted as the mother wavelet to construct the wavelet kernel function, the vegetation index was convoluted with the wavelet kernel function, and the amplitude after convolution was used as the modeling feature information. The formula is as follows [29] : where, g(x, y) represents a Gaussian modulation function; σ x and σ y represent the standard deviation on the X and Y axes; h(x, y) represents a wavelet function; W is the frequency of the sine function on the X-axis; H(u,v) is the Fourier transform; u is the frequency of independent variable; v is the amplitude value of the frequency signal, and σ u and σ v are the standard deviations on the U and V axes, respectively. 22 ( , ) ( ) ( ) where, (h*I) is the convolution of filter h with data I; h R and h I are the real and imaginary parts of filter h, and S(x, y) is the characteristic obtained by Gabor filter. h(x, y) as the mother wavelet can be scaled and rotated to obtain a set of self-similar filters: The principle of SVM involves finding an optimal hyperplane that satisfies the classification requirements. The hyperplane maintains the classification accuracy and maximizes the interval between the two types of classification samples [30] . It is widely used in remote sensing classification because of its simple structure, strong adaptability, and global optimal characteristics. The discriminant function of the model is as follows: where, α i is the Lagrange multiplier; S v is the support vector; x i and y i are support vectors in two classes; b is the threshold, and k(x i , x) is the kernel function. In this study, RBF was selected as the kernel function of SVM.
How to set the penalty factor and kernel parameter is the problem that SVM faces in practical application. The traditional methods of parameter selection are mostly by trial and error. Cross-validation is widely used in modeling applications, but this method is inefficient and requires a heavy workload. GA was used to optimize the SVM. The steps are as follows [31] : 1) The population was initialized.
2) Select training data and validation data. Seventy-two sample plots in the field were investigated during the study period. Forty-two samples were designated as the training samples. The remaining 24 samples were the verification samples.
3) Using SVM to train and test the data. The fitness function was selected, and the fitness value of each individual was calculated.
4) The maximum genetic algebra (100) of the initial setting was assessed if it was satisfied, and the optimal penalty factor and kernel parameter were obtained when the condition was met. Otherwise, the next step was performed. 5) Crossover operator and mutation operator were performed to form a new generation of individuals.
Step 2 was performed again to continue the algorithm's optimization until the termination condition is met to exit the loop.
6) The parameter-optimized SVM model was used to detect wheat scab.

Features of spectral reflectance
Spectral reflectance is the simplest and most direct feature. The spectral reflectance signals in visible and near-infrared regions reflect the changes in physical and biochemical components caused by vegetation stress. These signals have been widely used in remote sensing monitoring and early stress diagnosis of crop diseases and pests [16]. Figure 2 shows the spectral reflectance curves of healthy and infected wheat ears. The morphological difference between the two spectra was not obvious. To perceive the changes in the spectrum more intuitively, we calculated the reflectance ratio between the diseased and healthy wheat ears (Figure 3). The reflectance ratio reached the maximum in the 500-600 band regions, thereby indicating that the spectral reflectance of wheat after stress in this band was greatly improved. This result was obtained, because when wheat is under stress, the content of chlorophyll decreases and the absorption ability of visible regions is weakened, thus, the reflectance at the "green peak" increases [4] . Correlation analysis was performed to assess whether significant relationships exist between band reflectance and disease severity. In Figure 4, the reflectance of 350-400 nm in the visible regions exhibited the highest correlation with disease severity, and all correlation coefficients in the near-infrared regions (720-1000 nm) were greater than 0.7. Therefore, the bands of 500-600 nm with the greatest variation and 350-400 nm and 720-1000 nm with the largest correlation coefficients in the visible and near-infrared regions were selected as the preferred features of the original spectral feature set.

Vegetation indices
Vegetation indices based on a certain physiological significance can enhance and highlight some spectral changes to obtain a more ideal result [16] . Table 4 summarizes the responses of all vegetation indices to wheat scab. Three vegetation indices, namely, MSR, NDVI, and SIPI, were significantly correlated (p-value < 0.01) with disease severity; the correlation coefficients were 0.62, 0.58, and 0.55, respectively. Other indices did not show significant responses to disease severity. Thus, we chose MSR, NDVI, and SIPI as the sensitive vegetation indices.

Wavelet features
Three vegetation indices, namely, MSR, NDVI, and SIPI, were transformed by wavelet transform. A total of 32 wavelet kernel functions (four scales and eight directions) were constructed, and they increased the data dimension by 32 times. To determine the best wavelet features, we analyzed the correlation between wavelet features and disease severity and selected the features with significant differences (p-value < 0.01) as the sensitive features by t-test. The scale, direction, and features' names are shown in Table 5.

Construction of monitoring models based on original spectrum
A total of 72 samples were collected in this experiment, among which 42 random samples were used as the training samples, and the remaining 24 samples were used as the verification samples. According to the three bands regions of 350-400 nm, 500-600 nm, and 720-1000 nm selected in 3.1.1, the three bands with the largest correlation in the three regions were 354 nm, 525 nm, and 761 nm. These bands were the input variables, and SVM and GASVM were used for the model construction. The monitoring results, overall accuracy (OA), and Kappa coefficient of the six models are shown in Table 6. The accuracy of the monitoring model established by GASVM was better than that by SVM algorithm based on the same characteristic variables. In the SVM model, the overall accuracy of the monitoring model at 761 and 354 nm, both of which were 58.3%, was slightly higher than that at 525 nm, whereas the monitoring model at the 525 nm band had a monitoring accuracy of only 50%. In the GASVM model, the overall accuracy at 756 nm was 4.2% and 12.5% higher than that at 354 nm and 525 nm, respectively, but it was only 66.7%, and the Kappa coefficient of the model was 0.25. In general, the overall accuracy of all models was not high.

Construction of monitoring models based on vegetation indices
Three vegetation indices, namely, MSR, NDVI, and SIPI, were used as the input variables to build the model. Table 7 shows the results of the models constructed by SVM algorithm and GASVM for scab detection. MSR had the highest identification ability for scab, followed by NDVI and SIPI. With the same input variables, the accuracy of the GASVM model was higher than that of the SVM model. The ability of the MSR-GASVM model to detect scab was better than that of the other models. The overall accuracy was 75%, and the Kappa coefficient was 0.47. In general, the vegetation index was better at detecting scab than the original waveband, because the vegetation index enhances the difference between healthy and infected samples by combining and transforming band reflectance [32] .

Construction of monitoring models based on wavelet features
As discussed in Section 3.1.3, WF01 and WF02 were sensitive to scab at ear scale. We designed four scab identification models using these two sensitive features. Table 8 shows the results of discriminating between healthy wheat and wheat infected by scab by using WF0 and WF1. We can see that the model constructed by GASVM had better identification results than the SVM model, and the overall accuracy was 70.8%. For the SVM monitoring model, the highest overall accuracy of the verified samples was only 62.5%, and the Kappa coefficient was 0.23. The results show the same property as the monitoring model constructed by vegetation indices and original wavebands. The wavelet features were superior to waveband features in terms of the ability to identify scab, but they were inferior to MSR, thereby indicating that the MSR-GASVM model can ideally identify scab-infected ears. When wheat is infected by scab, its appearance or internal structure changes and shows some differences in spectral reflectance and radiation characteristics. This phenomenon is also the fundamental theoretical basis for the identification of wheat scab by remote sensing spectroscopy [9] . In addition to selecting the appropriate algorithm for feature extraction, choosing the suitable modeling method also has a great impact on the improvement of the monitoring level of crop diseases [33] . According to the results obtained in the present study, the identifications of disease severity based on GASVM models were generally ideal, and the MSR-GASVM model achieved the best classification accuracy (75%) in terms of discriminating between healthy wheat and wheat infected by scab. GA has an advantage in terms of global optimization of solution space. SVM approximates the nonlinear arbitrarily and establishes an excellent nonlinear mapping model with small samples. It overcomes the defects of some traditional intelligent algorithms that easily fall into the local minimum and has high stability and robustness in prediction and control; however, it has difficulties in kernel function or parameter selection [34] .
Combining these two algorithms is beneficial because they complement each other and greatly improve the overall accuracy of the model. This combined model has a higher practical significance than the disease recognition model constructed by a single SVM algorithm.

Conclusions
In this study, we used the non-imaging hyperspectral technique to collect spectral information and analyzed the spectral changes in wheat ears under stress. Using the original spectral features, vegetation indices, and wavelet features combined with SVM algorithm and GASVM to develop wheat scab monitoring models. Both model types identified wheat infected with scab, but the overall accuracy of the GASVM model was higher than that of the SVM model.
The MSR-GASVM model had the best identification performance. The results had relevant implications for identifying wheat scab at the ear scale and provided a theoretical reference for the further study of the identification of scab at the canopy or regional scale.