Novel method for identifying wheat leaf disease images based on differential amplification convolutional neural network

: In this study, a differential amplification convolutional neural network (DACNN) was proposed and used in the identification of wheat leaf disease images with ideal accuracy. The branches added between the deep convolutional layers can amplify small differences between the real output and the expected output, which made the weight updating more sensitive to the light errors return in the backpropagation pass and significantly improved the fitting capability. Firstly, since there is no large-scale wheat leaf disease images dataset at present, the wheat leaf disease dataset was constructed which included eight kinds of wheat leaf images, and five kinds of data augmentation methods were used to expand the dataset. Secondly, DACNN combined four classifiers: Softmax, support vector machine (SVM), K -nearest neighbor (KNN) and Random Forest to evaluate the wheat leaf disease dataset. Finally, the DACNN was compared with the models: LeNet-5, AlexNet, ZFNet and Inception V3. The extensive results demonstrate that DACNN is better than other models. The average recognition accuracy obtained on the wheat leaf disease dataset is 95.18%.


Introduction 
Wheat is one of the most important rations in China. The development of the wheat industry is related to the country's food security and social stability directly. Therefore, it is important for yield and quantity to recognize wheat leaf diseases. However, at present, the main method of wheat leaf disease identification is manual identification, which has low efficiency and accuracy.
In recent years, deep learning has developed in image recognition.
In 1998, LeNet-5 was used for postal code handwriting recognition, which has a 7-layer network structure [1] . In 2012, Convolutional Neural Network (CNN) was used to achieve the best result in the ImageNet large-scale visual recognition challenge, which caused to receive widespread attention [2] . In 2014, Zeiler et al. [3] implemented ZFNet to visualize network structure through deconvolution technology. Simonyan et al. [4] proposed the visual geometry group (VGG) model that increased the depth of the network by adding a convolution layer of 3×3 convolution kernels, and used a small convolution kernel to replace a convolution layer with a larger convolution kernel, reducing the number of parameters. In 2015, Szegedy et al. [5] proposed the GoogleNet with more than 20 layers, which increased the depth of CNN, improved the utilization rate of the computer, reduced the parameters, and improved the accuracy. In 2016, through a series of correction methods that can increase accuracy and reduce computational complexity, Inception V2 and Inception V3 were proposed in the paper [6] . He et al. [7] used a residual network to solve the problem of vanishing gradients, so that the underlying network can be fully trained. As the depth increases, so does accuracy. The idea of cross-channel connection was further extended to multi-layer connections by DenseNet to improve representation [8] . In 2018, Khan [9] introduced a new channel improvement idea. The motivation for network training with channel boosted representations is to use rich representations. This idea effectively improved the performance of CNN by learning various features. In 2019, Hou et al. [10] proposed a method for selecting channels based on the relative of activation, and proposed weighted channel discarding for regularization of convolutional layers in CNN.
With the development of deep learning, crop disease identification has been developed, which not only reduces the workload but also improves the efficiency of pest identification. Zeng et al. [11] developed a CNN model with high-order residuals and parameter sharing feedback to apply to crop disease recognition in an actual environment. The recognition accuracy and robustness were better than other methods. Zhang et al. [12] used the model of VGG 16 to classify the apple leaves disease with higher accuracy. Amanda et al. [13] proposed use transfer learning to train a CNN, which had higher recognition accuracy in cassava disease pest recognition. Mohanty et al. [14] trained the CNN with 54306 healthy and morbid leaf images, and used it to identify 14 kinds of crops and 26 kinds of diseases. Lu et al. [15] used deep CNN to identify rice leaf diseases, which was more accurate than traditional machine learning models. Zhang et al. [16] used the LeNet model to identify the diseases of cucumber, which was more accurate than traditional methods. Huang et al. [17] proposed that GoogleNet was used to identify disease images of spikes, and the classification effect was obvious. In 2017, the capsule network was proposed by Sabour et al. [18] . Since CNN cannot learn spatial relationships, the pooling layer will lose the information, and the capsule will adjust the output according to the changes. Deng et al. [19] proposed the capsule network to classify hyperspectral images, and the classification accuracy rate exceeded CNN. In 2018, Gan et al. [20] established a hyperspectral inversion model for chlorophyll content prediction of longan leaves using sparse self-encoding of classic models of deep learning. The accuracy can be greatly improved by using deep learning methods. Zhu et al. [21] used the improved faster region-based convolutional network (Faster-RCNN) to identify plant leaves, and achieved a high recognition accuracy than Faster RCNN in the complex background.
With the increase of network depth, large network models tend to ignore light feedback errors, which lead to lower convergence rates [7] . Finally, the large deepening model itself tends to ignore the details of large-scale data. In view of the above problem, this study proposes the differential amplification convolutional neural network (DACNN), which can amplify small differences between the real output and the expected output. And it has achieved good results in the identification of wheat leaf disease images. The differential amplifier branches constructed in the deep neural layers can make the model more sensitive to the light error of each iteration feedback. It can alleviate the error omission. Since there is no large-scale wheat leaf disease images dataset at present, and the wheat leaf disease dataset was constructed.

Materials and methods
The DACNN contains 6 convolutional layers, 3 max-pooling layers and 3 fully connected layers. To improve the ability of feature extraction, 3×3 kernels are used to replace the larger kernels and convolution kernels are fully connected in the last two layers. In order to alleviate the omission of minor errors in the backpropagation pass, a branch is added before and after the deep convolution layer of the differential amplifier, so as to simulate the difference which achieves the function of error amplification. In Figure 1, the structure of the traditional CNN is compared with that of the differential amplification branch, and the advantage of the latter in the error amplification effect is proved by theoretical analysis.

Differential amplification branch
Scheme 1 in Figure 1 is the schematic diagram demonstrating the CNN that does not add a branch in deep neural layers, similar to the traditional CNN, whose data stream can be represented by where, w 1 and b 1 are the weight matrix and the bias of the lth neural layer, respectively; x l is the mapping input and T l+1 is mapping result of the lth neural layer, respectively, and E() is a linear activation function. Scheme 2 in Figure 1 is the schematic diagram demonstrating the CNN that adds a differential amplification branch in DACNN. Its data stream satisfies Equation (2).
where, w l and b l are the weight matrix and the bias of the lth neural layer, respectively; x l and H l+1 are the mapping input and mapping results of the lth neural layer, respectively; F l () is the mapping output of convolutional layers and E() is the linear activation function. Compared to Scheme 1, this structure can strip the unchanged part x l and highlight the minor change of F l (x l , w l , b l ), thus making the model more sensitive to error of the backpropagation pass during each iteration.
Suppose the input feature map is 100. It is expected mapping results and the actual mapping results in the convolutional layer are 105 and 110 respectively, and Δf 6 =5, as is shown in Equation (3).
f 6 and f 6 ′ represent the expected mappings and actual mappings of the convolutional layer, respectively. '′' represents functions and variables, etc. in actual situations. In Scheme 1, the ΔT 6 is 5 which is shown in Equation (4). 6 5 5 The proportion of ΔT 6 is shown in Equation (5). 0.0476 105 In Scheme 2, there is Naturally, 5 5 5 5 5 5 And Δf 5 = 5 are got. The proportion of Δf 5 is shown in Equation (8).
Obviously, 5 F P in Scheme 2 is much larger than 6 T P in Scheme 1. Therefore, the network structure in Scheme 2 can enlarge the error in backpropagation pass between the expected output and the actual output, which is beneficial to the correct convergence of the model.
For the initial input x 0 the mapping result of the Lth neural layer satisfies Equation (11).
From Equations (10) and (11), it can be seen that the differential amplification effect can be accumulated layer by layer, thus improving the fitting ability of the model to image pixel distribution and the identification accuracy to a maximum extent.

Normalized layers
As noted above, owing to the influence of sunlight, water mist, dust, and other factors, the range of the signal intensity in gathered images is extremely wide. Signals with wide ranges of values often play a major role in model learning, and smaller range signals have less effect, thus affecting the trend of model coverage. Moreover, the range of the function domain is limited, so the input data need to be mapped into this domain. To solve the above problems, the local response normalization (LRN) is used before and after the differential amplification branch.
By creating a competition mechanism, LRN can make the activity of local neurons with the larger response, inhibit other neurons with smaller feedback, which improves the generalization ability of the model, and prevent the data from overfitting [2] , as is shown in Equation (12).
where, () x is the input value, α = 0.0001 is the scaling factor, β = 0.75 is the exponential term, n = 5 is the local size of the normalized range.

Dropout
In order to improve the generalization ability and inhibit overfitting, the dropout strategy [22] is introduced in the differential amplification branch. When the network propagates forward, it stops a neuron with a certain probability of p, its activation function value change from probability p to 0. Dropout reduces the dependence between neurons by forcing a neuron to interact with randomly selected neurons and prevents some features from having effect only under other specific features. So that dropout can improve the generalization ability of the model. The dropout rate is set to 0.5 in this study, that is to say, when the neurons pass dropout, half of them will be set to 0. Figure 2 illustrates the training process of DACNN with dropout.  (15) and (16). The whole procedure is indicated below.

Exponential linear unit
In this paper, we use the exponential linear unit (ELU) as the nonlinear activation function, as shown in Equation (17).
ELU is an improved version of the Rectified Linear Unit (ReLU). Compared to the ReLU function, when the input is negative, it has a certain output. As shown in Figure 3, the linear part of the right segment can alleviate the gradient disappearance, while the soft saturation end makes it more robust to input changes and noise at the left. The mean value of the output is close to 0, and the convergence speed of the ELU is fast.

Figure 3 Exponential linear unit 2.5 Experimental setup
The computer model is HP EliteDesk 880 G2 TWR, the processor is Intel(R) Core(TM) i7-6700K CPU @ 3.40 GHz, and the RAM is 16 GB. Furthermore, the operating system is Ubuntu 14.04.4 64 bits. Training a deep CNN on the large-scale images through a large number of iterations largely relies on GPUs with the high performance. Its basic configuration is listed in Table 1. The Python is utilized as the programming language to adapt to the 208 July, 2020

Construction of the dataset
As there are no large-scale images of wheat leaf diseases, therefore, images were collected from several wheat planting bases in Shandong province. Then, they were expanded by 5 kinds of data augmentation techniques to construct the wheat leaf disease dataset. It is expected that these experiments can shorten the distance between the theoretical research of neural networks and the practical agricultural application.

Acquisition of images
The wheat leaf images were collected from the wheat planting bases of Shandong Province of China. The number of the original dataset is 8326, containing normal leaf and 7 kinds of diseases, which are mechanical damage leaf, powdery mildew, bacterial leaf streak, cochliobolus heterostrophus, stripe rust, leaf rust and bacterial leaf blight. The images were taken with a Canon EOS 80D (18-200 mm). The image format is JPEG and each image is a 24-bit color bitmap. The numbers and proportions of the wheat leaf disease image in the original dataset are shown in Table 2, and the samples of wheat leaf disease images are shown in Table 3.

Data preprocessing
The CNN self-learning relies on iterative training on a large-scale dataset. If the amount of data is too small, it is prone to cause the overfitting, which makes the training error very small while the testing error very large [23] . In order to increase the size and diversity of original dataset, 5 ways are adopted to implement dataset augmentation which are add Gaussian noise, color jittering, fancy PCA, mirror horizontally and Gaussian blur, as shown in Table 4, and the images processed by the methods of data augmentation are shown in Table 5.
Data augmentation can produce 6 corresponding enhanced images of every category of wheat leaf disease images. Finally, the number of data augmentation of wheat leaf diseases is 41630, the number and proportion of each kind of wheat leaf disease images are shown in Table 6. Gaussian blur Each pixel takes the average value of the surrounding pixels, when calculating the average value, the fuzzy was affected by the blur radius, and the blur radius is set to 2.

DACNN-Softmax, DACNN-SVM, DACNN-KNN and DACNN-Random Forest
In this experiment, DACNN is combined with softmax, support vector machine (SVM), K-nearest neighbor (KNN), and Random Forest to identify the augmented dataset, which aims at investigating the effect of different classifiers on identification results by observing their trend of accuracy change. In KNN, k is set to 100. Radius Basis Function (RBF) is used in SVM. The penalty parameter C, γ, and slack variable ζ are initialized to 10, 0.02 and 0.001, respectively. The number of decision trees in Random Forest is 200, and the Gini index is used:  It can be seen from Figure 4 that when the models are convergent, the identification accuracy of DACNN-SVM and DACNN-Softmax are 95.32% and 96.09%, respectively, which is obviously superior to the accuracies of DACNN-KNN and DACNN-Random Forest of 90.37% and 89.96%. Furthermore, through the experiment process, we can see that the identification accuracy of DACNN-SVM is higher than that of DACNN-Softmax when the number of iterations is small. This is because the number of iterations is small and the data throughput is small in the early experiment, and SVM is just a classification algorithm based on statistical learning theory, which replaces Empirical Risk Minimization (ERM) with Structure Risk Minimization (SRM). It is suitable for small sample data classification, so it has higher recognition accuracy than Softmax in the early stage.

DACNN, Inception V3, LeNet-5, AlexNet and ZFNet
In order to verify the performance of DACNN, it is compared with Inception V3, Lenet-5, AlexNet and ZFNet. LeNet-5 consists of 3 convolutional layers, 2 subsampling layers, and 3 fully connected layers, which have been widely used in digital handwriting recognition; Both AlexNet and ZFNet contain 5 convolutional layers, 3 subsampling layers, and 3 fully connected layers. However, the former uses two GPU sparse connection structures, while ZFNet uses only one GPU dense connection structure.
Inception V3 works by performing multiple convolution and pooling operation on the image and outputs a deep feature map. In the above experimental environment, the 5 models are iterated 50 000 times on the augmented dataset and save the intermediate model every 5 000 iterations and validation it with the test dataset. The training process of the model is shown in Figure 5.
It can be seen from Figure 5 that when the number of iterations is close to 25 000, the DACNN begins to converge, the average identification accuracy of DACNN is about 95.18%, which is higher than the accuracy of Inception V3 94.31%, AlexNet 91.54% and ZFNet 92.79%, and is obviously higher than the accuracy of LeNet-5 89.15%. DACNN owns higher identification accuracy for the wheat leaf disease images. The error amplification effect of DACNN can be accumulated layer by layer, which makes the network more capable of fitting the pixel distribution of the image and improves the classification accuracy.

Conclusions
In this study, we deal with the recognition of the wheat leaf disease image by proposing a novel method named DACNN, a differential amplification convolutional neural network. Especially, branches before and after the deep convolution layer in DACNN were added to simulate the differential amplifier and realize the function of error amplification. Then, there is no standard dataset of wheat leaf diseases, constructing the wheat leaf disease dataset.
Finally, the experimental results with Inception-V3, AlexNet, ZFNet and LeNet-5 and combined with four classifiers, which are Softmax, SVM, KNN and Random Forest on the wheat leaf diseases dataset show the superiority of DACNN. For future work, we plan to apply DACNN to other types of visual tasks, such as object detection.