Research on recognition for cotton spider mites’ damage level based on deep learning

: The changes in cotton leaf characteristics are closely related to the cotton spider mites’ damage level. Extracting the distinguishable features of cotton leaves is an effective method to identify the level. However, it faces enormous challenges for the classification due to various factors, such as illumination intensity, background complexity, shooting angle and so on. A recognition model is proposed, which is trained through transfer learning with the two-stage learning rate from 0.01 to 0.001 based on MobileNetV1. The experiments demonstrate that the deep learning model attains the accuracy of 92.29% for the training set and 91.88% for the test set of the mixed data. For testifying the effectiveness of the two-stage training method, the models are trained with the two public datasets, CIFAR-10 and Flowers, and attain the accuracy of 95.46% and 95.57% for the test sets, respectively. The average recognition time for a single cotton leaf image is about 0.015 s. Furthermore, the mobile terminal application is developed with the model embedded, to realize the real-time recognition for cotton spider mites’ damage level in the field.


Introduction 
Cotton as one kind of important fiber and oil crop, its management is of great significance to increase the income of farmers and promote the development of local economy.With the continuous expansion of the planting area, controlling cotton pests is becoming more and more intractable [1] in Shihezi, Xinjiang, China.The farmers carry out on-the-spot checks and calculate the amount of active cotton spider mites through magnifying glasses, to determine the damage level.They mark the location of the damaged cotton plants with carry-on tree branches.They call them the center plants, where cotton spider mite pests would expand.The method is time-consuming, subjective and limited obviously.It influences pest monitoring and affects the quantity and quality of cotton to a certain extent.A mobile terminal application is developed to recognize cotton spider mites' damage levels quickly and accurately.It records the location information of the center plant precisely to visualize on the map.Moreover, the farmers conduct regular reviews of sampling points.It is conducive to taking appropriate measures to prevent the aggravation of damage degree and control the spread of pest areas timely [2] .
With the development of computer vision [3,4] and agricultural informatization [5,6] , many scholars pay more attention to the field of plant pests and diseases [7][8][9] .He et al. [10] extracted the color feature set of cotton leaf images damaged by cotton spider mites, to classify three damage levels through machine vision technology.But variations in background intensity were not considered and the non-destructive measurement method was not applied.Zhang et al. [11] adopted the Support Vector Machine (SVM) with radial basis kernel function to classify cotton leaves damaged by five kinds of pests.It has attained an accuracy of 88.1% under laboratory conditions.Lu et al. [12] conducted an in-field wheat disease diagnosis based on VGG structure.It has reached an average accuracy of 97.95% and maintained the location of disease areas.
The proposed model has been packed into a real-time mobile application.The classification result was calculated by the server and sent back to the mobile terminal.The studies above indicated that machine learning algorithms required the handcrafted design of effective feature sets.Deep learning [13][14][15] algorithms combine simple features into more complex features automatically.The methods realize feature self-learning and attain higher accuracy [16][17][18] , and whether the feature selection is reasonable has a great influence on the recognition effect.Currently, most studies about the classification of cotton spider mites' damage level are not applied to the actual scene.In order to realize the real-time recognition through mobile terminal devices under field conditions, the MobileNetV1 [19] , which is the lightweight convolutional neural network [20][21][22] , is selected as the basic network structure of the model training in this study.

Data collection
Cotton spider mites are too small to be detected with naked eyes.The spiders grow and mainly suck juice on the back of the cotton leaf surface.It results in yellow and red spots appearing on the leaf surface [23] .The cotton leaf image collection has been carried out twice through non-destructive measurement.For simplifying the research, the images only contain the damage spots caused by cotton spider mites.The image acquisition device is HUAWEI smartphone, Honor 7, and the resolution of images is 2448×3264 pixels.The image collection scene is shown in Figure 1.

Figure 1 Image collection scene
The first data collection is carried out from August 16th to 19th, 2017.The white board is used to mitigate the impact of the complex background.Shihezi city is two time zones west of Beijing, China.The images are directly photographed from 5 pm to 7 pm Beijing time and 197 cotton leaf images are gathered.The second data collection is carried out from July 18th to 22nd, 2018.2369 images were captured with a white board at different time periods of the day, to increase the diversities of illumination intensity.
1753 images are acquired in a complex field environment without a white board to augment background complexity.

Data classification
According to 'Rules for monitoring and forecast of the cotton spider mites' [24] , there are four damage levels of cotton leaf: level 0, no damage; level 1, sporadic yellow patches on the leaves; level 2, red patches account for less than 1/3 of the leaf area; level 3, red patches account for more than 1/3 of the leaf area.The four damage levels are adopted as the classification labels in the deep learning model training process.
Different image annotation methods are adopted.According to the classification criteria and experiences received from farmers, the annotation of the images in the first data collection is conducted manually and defined as dataset1.The recognition model T 1,1 is established through transfer learning with dataset1 based on MobileNetV1.For cotton leaf images captured in the second data collection with a white board, the manual annotation is carried out according to the classification criteria firstly.Then model T 1,1 is adopted for reclassification, auxiliary for rectification and modification.The final result is defined as dataset2.For the images captured in the complex field environment of the second data collection, their backgrounds are more complex than that of dataset2 and dataset1.Hence the manual annotation is only performed in accordance with the classification criteria and defined as dataset3.
The examples of cotton leaf images after classification are shown in Figure 2.
In Figure 2, the cotton leaf images of dataset2 increase the illumination complexity compared with dataset1.The images of dataset3 augment background complexity compared with dataset2.The three different datasets are divided into the training and test set according to the ratio of 4:1 [9] separately.The distribution of the three datasets is shown in Table 1.

Data augmentation
In the training process of deep learning models, a large number of images are needed to extract effective image features [25,26] .The complex collection conditions should be considered in Shihezi, Xinjiang, China.Different data augmentation [17] ways are chosen to expand the number of training sets and enhance the robustness of the model to image changes [17,27] .Due to that the cotton spider mites' damage level is closely related to the proportion of patch area, two ways of image rotating and noise addition are adopted.Image rotating can rotate the images according to an angle range randomly.It simulates the shooting angle changes in the image acquisition process.Noise addition such as salt and pepper noise or Gaussian noise is used to perform the random perturbation.It can imitate the various degrees of interference information [28] on the cotton leaf surface.After data augmentation, the number of training sets are expanded to 40 times, which are increased from 157 to 6280 for dataset1, from 1894 to 75 760 for dataset2, and from 1388 to 55 520 for dataset3.

Design of experiments
In the MobileNetV1 structure, the standard convolution is factorized into depthwise convolution and pointwise convolution, which is depicted in the model calculation section of Figure 3.The depthwise separable convolution can greatly reduce the amount of computation and change the size of the model.The 3×3 depthwise separable convolution applies between 8 to 9 times less computation than standard convolution [19] .
Based on MobileNetV1, the model with the two-stage training method is established.It is integrated into the mobile application software and provides support for the real-time recognition of cotton spider mites' damage level in the field.After the captured image recognized by the model, the information of damage level and geographical location is uploaded to the server in real-time.The sampling points are realized visualization on the map.The greedy algorithm [23] is employed to perform path optimization on the sampling points selected through farmers.
The optimized inspection route is rendered to achieve a review of selective sampling points in the field.The technical pipeline is shown in Figure 3.

Figure 3 Technical pipeline
The models training is performed on the Inspur high performance computing cluster platform with NVIDIA Tesla K40m GPU, and its memory is 12GB.The system is GNU/Linux 3.10.0-327.el7.x86_64, the compile and run environment is Anaconda 5.1 for Python 3.6, and the model training framework is TensorFlow 1.8.0 [30] .
The model trainings proceeded with different datasets.They are trained through transfer learning based on the model MobileNet_v1_1.0_224(named model T0 in the study) generated with ILSVRC-2012-CLS [29] .In addition, the hyper parameters [19] are all set that, the width multiplier α is 1 and the resolution multiplier ρ is 1.All images are downsampling into the size of 224×224 pixels with bilinear interpolation [30] .
The definition of model accuracy is shown as follows, (2) Recognize dataset2.If the recognition performance is good, go to step (4).Otherwise, go to step (3).
(3) Model training based on model T0 and dataset2.Model T 2,1 , T 2,2 and T 2,3 are obtained.Among them, model T 2,3 performs the best recognition effect and go to step (4).
(6) Recognize the mixed data of dataset1, dataset2 and dataset3.If the recognition performance is good, turn step (7).Otherwise, replace model T0 and go to step (1).
(7) The model is selected as the ultimate recognition model for cotton spider mites' damage level.

Experiments of model training with dataset1
The training is performed through transfer learning based on model T0.It is trained with the initial learning rate set as 0.01.The train loss function curve is shown in Figure 5.In Figure 5, the train loss function curve tends to converge when the training steps reach about 40 000, and the loss value is about 0.3.The model obtained this moment is selected as model T 1,1 .The recognition results for dataset1-test and dataset2 are displayed in Table 2. From Table 2, model T 1,1 has a poor recognition effect on dataset2 although it achieves good performance on dataset1.The analysis of the phenomenon is as follows, the images of dataset2 increase illumination complexity compared with dataset1 due to different capture time.It reveals model T 1,1 is less robust to the changes of illumination intensity.Thus the model training is performed with dataset2 which captured in different illumination intensities.

Experiments of model training with dataset2
When adopting the mode of transfer learning, it is a better choice to reduce the initial learning rate to 1/10 of the original [31] .Three different learning rates, 0.01, 0.001 and from 0.01 to 0.001 are applied while training with dataset2 .The loss function curves are shown in Figure 6.

Figure 6 Loss function curves of model training with dataset2
In Figure 6, the curves reveal that: (1) For the model training with the initial learning rate of 0.01, the train loss function curve tends to converge when training steps reach about 40000.The loss value is about 0.75.The model obtained at the moment is selected as model T 2,1 .With the same initial learning rate of 0.01, the loss value of model T 2,1 is higher than that of T 1,1 .The fitting is more difficult because of the increased illumination complexity of dataset2 compared with dataset1.
(2) On the account that the initial learning rate of model T0 is 0.01, it is reduced to 0.001 to conduct model training.The train loss function curve tends to converge when the training steps reach about 40000, and the loss value is about 0.6.The model obtained at the moment is selected as model T 2,2 .With the same dataset, the training steps are roughly the same between model T 2,2 and T 2,1 .
The loss value of model T 2,2 is lower than that of T 2,1 .It illustrates that the fitting performance is relatively good with the learning rate reduced to 1/10 of the original through transfer learning.
( The recognition results for the test set of dataset2 with different models are shown in Table 3. From Table 3, the accuracy of model T 2,2 is increased by 4.21% compared with T 2,1 for the test set of dataset2.Otherwise, the accuracy of model T 2,3 is further improved with respect to T 2,2 and T 2,1 .The recognition result for dataset3 with model T 2,3 is shown in Table 4. From Table 4, model T 2,3 has a poor recognition effect on dataset3 which captured in the complex field environment.It indicates that model T 2,3 is less robust to the changes in background complexity.Thus the model training is performed with the mixed data of dataset1, dataset2 and dataset3.

Experiments of model training with mixed data of dataset1, dataset2 and dataset3
The training is performed with the mixed data of dataset1, dataset2 and dataset3 through transfer learning based on model T0.The two-stage learning rate from 0.01 to 0.001 is used.The train loss function curves are shown in Figure 7.   5.The results in Table 5 further authenticate the attained model T 1+2+3,3 has a higher accuracy for each dataset.The recognition results for the cotton leaf images of different damage levels in the test set of the mixed data are shown in Table 6.
From Table 6, model T 1+2+3,3 attains the accuracy of more than 90% for each damage level.Under the premise of the better overall recognition effect, it achieves a good recognition effect for each damage level, especially confusing level 1 and level 2. For cotton leaf images, the probability values of each damage level that model T 1+2+3,3 calculates are shown in Figure 9.In Figure 9, model T 1+2+3,3 attains a good recognition effect and achieves certain robustness to the cotton leaf images with changes of illumination intensity and background complexity.

Experiments of model training with public datasets
In order to corroborate the effectiveness of the two-stage deep learning model training method, the model trainings are performed with public datasets, CIFAR-10 and Flowers.The three different learning rates, 0.01, 0.001 and from 0.01 to 0.001 are applied.The recognition results for the test sets with different attained models are shown in Table 7. From Table 7, the accuracy of the attained model with the two-stage training method is higher than that of models trained with the learning rate of 0.01 or 0.001 alone for each dataset.It proves the two-stage training method also achieves a good recognition effect for the two public datasets.

Conclusions
In this study, the two-stage deep learning model training method has been proposed.It is trained with a higher initial learning rate of 0.01 in the first stage.The learning rate is reduced to 0.001 in the second stage after the first convergence.
The recognition model for cotton spider mites' damage level based on MobileNetV1 is developed.It attains the accuracy of 92.29% for the training set and 91.88% for the test set of the mixed data.The average recognition time for a single cotton leaf image is about 0.015 s.Thus, the model has high recognition accuracy, real-time performance and better generalization ability.It is suitable for cotton leaf images in single and complex scenarios.Moreover, the mobile terminal application is developed based on the model to realize real-time recognition for cotton spider mites' damage level in the field.It makes the control work of cotton spider mites more accurately, efficiently and timely.Although the model can better recognize the captured cotton leaf images of different damage levels in the field.Meanwhile, the obtained models that with the two public datasets, CIFAR-10 and Flowers through the two-stage training method attains a better recognition effect.Such an in-depth exploration of the reduced range of high-to-low learning rates has been not conducted.
Further, we will perform experiments about the reduced range of learning rate to optimize the model.

Figure 2
Figure 2 Cotton leaf images of different damage levels in different datasets is the number of correctly recognized images, and F is the number of incorrectly recognized images.The model training process is shown in Figure 4.The steps of the model training process are as follows, (1) Model training based on model T0 and dataset1.Model T 1,1 is obtained with good recognition effect and go to step (2).

Figure 4 Figure 5
Figure 4 Flow chart of the model training process ) The two-stage model training with learning rate from 0.01 to 0.001 is performed.The model attained in the first stage is model T 2,1 .Then model training of the second stage is continued with the learning rate reduced to 0.001.The loss function curve tends to converge again when the total training steps reach about 160 000, and the loss value is about 0.1.The model obtained at the moment is selected as model T 2,3 .With the same dataset, the loss value of model T 2,3 is lower than that of T 2,2 and T 2,1 obviously.It illustrates that the two-stage training method can make the model fit better.The loss value has begun to fall below that of the previous two trainings when the total training steps reach about 60000.It indicates that the training of the second stage can be better and faster fitted.Nevertheless, the total training steps of model T 2,3 is increased by 3 times of model T 2,1 .Comparing the model training processes with three different learning rates above, the corresponding loss value becomes smaller when the loss function converges.It reveals the predictive values better fit the true values with the two-stage learning rate from 0.01 to 0.001.

Figure 7
Figure 7 Loss function curves of model training with mixed data In Figure 7, the curve tends to converge when the training steps reach about 40 000 with the initial learning rate of 0.01 in the first stage.The loss value is about 1.2.The model obtained at the moment is selected as model T 1+2+3,1 .With the same initial learning rate of 0.01, the loss value of model T 1+2+3,1 is higher than

Figure 8
Figure 8 Validation loss function curves of model training with the test set of mixed data The curves in Figure 8 illustrate that the validation loss value for the test set of mixed data shows a downward trend during the training process.The validation loss value is about 0.5 when the total number of steps is about 160 000 with the learning rate reduced from 0.01 to 0.001.It shows the obtained model T 1+2+3,3 performs better classification effect and better generalization ability.The recognition results for the mixed data and different datasets with model T 1+2+3,3 are shown in Table5.

Figure 9
Figure 9 Model recognition results of cotton leaf data for each damage level