Identification of banana fusarium wilt using supervised classification algorithms with UAV-based multi-spectral imagery

The disease of banana Fusarium wilt currently threatens banana production areas all over the world. Rapid and large-area monitoring of Fusarium wilt disease is very important for the disease treatment and crop planting adjustments. The objective of this study was to evaluate the performance of supervised classification algorithms such as support vector machine (SVM), random forest (RF), and artificial neural network (ANN) algorithms to identify locations that were infested or not infested with Fusarium wilt. An unmanned aerial vehicle (UAV) equipped with a five-band multi-spectral sensor (blue, green, red, red-edge and near-infrared bands) was used to capture the multi-spectral imagery. A total of 139 ground sample-sites were surveyed to assess the occurrence of banana Fusarium wilt. The results showed that the SVM, RF, and ANN algorithms exhibited good performance for identifying and mapping banana Fusarium wilt disease in UAV-based multi-spectral imagery. The overall accuracies of the SVM, RF, and ANN were 91.4%, 90.0%, and 91.1%, respectively for the pixel-based approach. The RF algorithm required significantly less training time than the SVM and ANN algorithms. The maps generated by the SVM, RF, and ANN algorithms showed the areas of occurrence of Fusarium wilt disease were in the range of 5.21-5.75 hm, accounting for 36.3%-40.1% of the total planting area of bananas in the study area. The results also showed that the inclusion of the red-edge band resulted in an increase in the overall accuracy of 2.9%-3.0%. A simulation of the resolutions of satellite-based imagery (i.e., 0.5 m, 1 m, 2 m, and 5 m resolutions) showed that imagery with a spatial resolution higher than 2 m resulted in good identification accuracy of Fusarium wilt. The results of this study demonstrate that the RF classifier is well suited for the identification and mapping of banana Fusarium wilt disease from UAV-based remote sensing imagery. The results provide guidance for disease treatment and crop planting adjustments.


Introduction 
Banana (Musa spp.) is the most popular fruit crop and is widely cultivated in tropical and subtropical climatic regions. Fusarium wilt of banana, also called Panama disease, is a serious soilborne fungal disease caused by the fungus Fusarium oxysporum f. sp. cubense race 4 (Foc 4) [1] . Currently, this disease threatens banana production areas worldwide [2] . It is disseminated either through infected plant material, contaminated soil, tools, or footwear or due to flooding and inappropriate sanitation measures [2] . The first visible signs of Fusarium wilt are yellowing or splitting of the oldest leaves, followed by leaf wilt and buckling, forming a 'skirt' around the pseudostem before falling off [3] . At present, chemical treatment of infected plants is often ineffective. Once a diseased plant is found, 'timely removal' is the best way to avoid the formation of a disease center [4] . Therefore, timely monitoring of the occurrence of banana Fusarium wilt disease is very important for the disease treatment and crop planting adjustments.
Traditional ground surveys to collect crop disease data are expensive and time-consuming [5] . Remote sensing technology has become a feasible means for crop disease detection and assessment in the past few decades, including for detecting Fusarium head blight and rust infection in wheat [6][7][8][9][10] , bacterial leaf blight in rice [11,12] , and grey leaf spot in maize [13] . When plants are infected with diseases, the leaf water, pigment content and internal structure undergo changes, which are reflected in the spectral signature of the plants [18] . Many spectral features of vegetation were found in the red-edge band that is related to changes in chlorophyll content and leaf area index [14][15][16] , and significant changes were observed when bananas were infected with Fusarium wilt. In recent years, various lightweight multispectral sensors that include the red-edge band (i.e., MicaSense RedEdge M TM ) were designed specifically for unmanned aerial vehicle (UAV) platforms for vegetation monitoring [17] . With the rapid development of UAV technology, UAVs have been increasingly used for acquiring imagery to extract phenotypic information of crops rapidly due to their advantages (i.e., high spatial resolution, ease of operation, high flexibility, and acquisition of data on demand) [18][19][20][21] . Moreover, scale effects and scaling have become one of the most important research topics in remote sensing [22] . Different spatial resolution images show different landscape characteristics, and data with higher spatial resolution usually get more accurate estimates [23] . However, seeking very high resolution data is unnecessary and unrealistic in the agriculture application at a regional scale as it is expensive and difficult to process. Therefore, it is very important to choose a suitable spatial resolution image for agricultural monitoring. At present, studies using remote sensing technology to monitor Fusarium wilt of banana are scarce.
Lu and Weng [24] stated that the success of any image classification is not only the use of appropriate imagery but also the use of a suitable classification method. Supervised algorithms are widely used because they are more robust than model-based approaches [25] . These classifiers can learn the characteristics of target classes from training samples and apply this information to the unclassified data [26] . The literature shows that a variety of supervised classification algorithms such as decision trees (DT), k-nearest neighbors (kNN) method, artificial neural networks (ANN), support vector machines (SVM), and random forest (RF) have been developed and tested for crop monitoring and/or land cover classification using remote sensing data [27][28][29][30][31][32] . Among these methods, the SVM, RF, and ANN algorithms are the most popular classification algorithms for remote sensing of the Earth's surface [33] .
The objectives of this study are to (i) evaluate the performance of the SVM, RF, and ANN classifiers for classifying imagery into areas infested or not infested with banana Fusarium wilt, and (ii) assess the effect of different resolutions on the identification accuracy of banana Fusarium wilt disease to provide a reference for large-scale applications of satellite-based data. The results will provide guidance for disease treatment and crop planting adjustments.

Study area
The study area is located in Long'an County, Guangxi Province, China (23°7'58.8''N, 107°43'55.2''E) ( Figure 1). It has a subtropical monsoon climate, characterized by year-round sufficient sunshine and rainfall. The average rainfall is 1200 mm a year and the mean annual temperature is 20.8°C-22.4°C. The soil is a sandy loam with pH 4.1, ammonium N content of 17.0 mg/kg, available P content of 180.3 mg/kg, available K content of 140.8 mg/kg, and organic matter content of 17 g/kg in the 0-40 cm soil layer. The field crops were bananas with the variety "Williams B6".
The banana variety has a leaf number of 34-36, the plant height is about 2.4-3.0 m, the growth period is 10-12 months, and the annual yield is 45 000-60 000 kg/hm 2 . The farm was developed in September 2015 and the planting distance was 2.0 m by 2.6 m (planting density of 130 plants/hm 2 ). The area was harvested for the first time in November 2016. By August 2018 (the time of field investigation in this study), the third generation of bananas had appeared in the field. In the study area, nearly 40% of banana plants were infected with Fusarium wilt of different severity. Figure 1 Location of the study area and distribution of ground survey sites

Data collection
In this study, a total of 139 sample plots were surveyed on 7 to 9 August 2018 to assess the occurrence of banana Fusarium wilt disease as ground truth data. Each plot had at least one banana plant. Figure 1 shows the distribution of ground survey sites. The samples were classified into two categories: healthy samples (total of 66) and Fusarium wilt diseased samples (total of 73). The classification standard adopted in this paper is mainly based on the ratio of yellow leaf disease area to total leaf area, which accounts for less than 1% is considered healthy, otherwise it is considered to be diseased. Finally, a total of 100 samples were randomly selected for calibration and the remaining samples were used for validation.
The acquisition of the multi-spectral imagery was conducted using a DJI Phantom 4 quadcopter (DJI Innovations, Shenzhen, China) on 7 August 2018. This UAV was equipped with a five-band multi-spectral camera (MicaSense RedEdge M TM , MicaSense, Inc., Seattle, WA, USA). The camera has a spectral range of 400 to 900 nm with a 47.2° field of view. It has five spectral bands: blue, green, red, RE, and near-infrared (NIR). The spectral bands have a ground sampling distance (GSD) of 8 cm at a flying height of 120 m above ground (Table 1), which were the conditions in this study. The sensor has a global shutter and all bands are aligned; the images have 12-bit radiometric resolution and the image capture rate is 1 Hz. In this study, the flight plan ensured cross-track and along-track overlap of 80% and a calibrated reflectance panel was imaged directly before and after each flight and used for reflectance calibration using the empirical line method.

Classification algorithms
In this study, the RF, ANN, and SVM classifiers were used to identify and map banana fusarium wilt disease.

Support Vector Machine (SVM)
SVM is a non-parametric supervised statistical learning classifier and has become increasingly popular for remote sensing classification [34][35][36] .
The SVM algorithm was developed by Vapnik [37] . The objective is to try to find the optimal hyperplane in the n-dimensional classification space with the largest difference between the classes [33] . Polynomial and radial basis function (RBF) kernels are the most commonly used functions for classification [38][39][40] . A number of studies have found that the RBF is superior to the polynomial kernel for the classification of remote sensing data [40][41][42] . There are two parameters that need to be set when using an SVM classifier with the RBF kernel, i.e., the cost function (C) and the kernel width parameter (γ) [43] . The C parameter trades off the misclassification of training examples against the simplicity of the decision surface [44] . The γ affects the smoothness of the class-dividing hyperplane [29] . A high C value may lead to over-fitting, whereas an increase in the γ value will affect the shape of the class-dividing hyperplane, which may affect the classification accuracy results [29] RF is one of the most popular DT-based ensemble models and was first proposed by Breiman [45] . It can be described as an ensemble of classification trees, where each tree votes on the class assigned to a given sample, with the most frequent answer winning the vote [46] . RF is an ensemble of many independent individual classification and regression trees (CART) and is defined as [47] : {h(x, θ k ), k=1,2,…i…} where, h represents the RF classifier, x is the input variable, and {θ k } represents the independently identically distributed random predictor variables, which are used for generating each CART tree [45] . The final response of the RF is calculated based on the output of all DT.
Two parameters are required for the RF model, namely, the number of predictors that are considered at each fork of the tree (mtry) and the number of random trees assembled during model building (ntree) [33] .
Theoretical and empirical research has demonstrated that the classification accuracy is less sensitive to ntree than to the mtry parameter [26,48] . It was reported that an increase in the values of mtry resulted in a higher predictive performance of the model and the attribution of higher importance to fewer variables [49] . Therefore, it is necessary to optimize the parameters mtry and ntree to maximize the model accuracy [50] .
The advantages of RF are low computational burden and easy to determine which parameters to use [51] . Over-fitting is less of an issue than in a single DT and there is no need to prune the trees which is a tedious task [52] . Although RF has shown high accuracy and ability to model complex interactions among variables, it is a ''black-box'' because the individual trees cannot be estimated separately [53] .

Artificial neural networks (ANN)
An ANN classifier can be described as a parallel computing system consisting of an extremely large number of simple processors with interconnections [33] . It is a mathematical model that is inspired by the structure and functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. The ANN was originally designed as a pattern-recognition and data analysis tool that mimics the neural storage and analytical operations of the brain. It has a distinct advantage in that it is non-parametric and requires little or no a priori knowledge of the distribution model of input data [40] . Moreover, the ANN fits an arbitrary decision boundary to separate the data points and produces high classification accuracy [40,54] . ANNs have successfully been applied to remote sensing in many fields [33,[55][56][57] .

Data processing and accuracy assessment
The classifications of the banana plants infested or not infested with Fusarium wilt were performed using the SVM, RF, and ANN classifiers with the calibration samples. In order to evaluate the contribution of the inclusion of the red-edge band to the identification accuracy of banana Fusarium wilt, two classification schemes using the input data with or without the red-edge band were used. In order to assess the classification accuracy of images with different spatial resolution, the original 8 cm resolution UAV imagery was resampled to generate images with 0.5 m, 1 m, 2 m, and 5 m resolution. Image resolution is closely related to acquisition costs and these resolutions were selected because they were similar to those of mainstream and easily accessible satellite imagery products (i.e., 0.5 m resolution WorldView series imagery, 1 m resolution GF-2 imagery, 2 m resolution GF-6 imagery, 5 m resolution RapidEye imagery) for agricultural applications.
After training, the validation samples were used for an accuracy assessment; a confusion matrix was developed and the overall accuracy and the Kappa coefficient were calculated [58,59] . The overall accuracy is the sum of the correctly classified plots divided by the total number of plots. A Kappa value of 1 represents a perfect agreement, whereas a value of 0 represents no agreement.
The data processing and classifications were conducted in ENVI 5.3 (Exelis Visual Information Solutions, Inc., Broomfield, CO, USA) and "RandomForest" package [60] . The distribution maps of banana Fusarium wilt were created in ArcGIS 10.2 (ESRI, Inc., Redlands, CA, USA).

Accuracy assessment of different classifiers for extracting Fusarium wilt disease
In this study, the verification samples were used to determine the classification accuracy of banana Fusarium wilt with different classifiers ( Table 2). The results showed that the SVM, RF, and ANN classifiers exhibited good performance for identifying Fusarium wilt disease in UAV-based multi-spectral imagery. All classifiers achieved comparable overall accuracies, which were higher than 90%. The SVM had the highest accuracy (overall accuracy = 91.4% and Kappa coefficient = 0.80), followed by the ANN (overall accuracy = 91.1% and Kappa coefficient = 0.79) and RF (overall accuracy = 90.0% and Kappa coefficient = 0.77). However, the RF algorithm required much less training time. The training times of the SVM and ANN were 11.6 and 2.0 times longer, respectively than that of the RF (Table 3). Overall, the comprehensive performance of the RF was superior to those of the SVM and ANN for identifying banana Fusarium wilt disease with acceptable accuracy. Furthermore, the performances of the input data with and without the red-edge band were compared. The results showed that the inclusion of the red-edge band increased the identification accuracy of banana Fusarium wilt in this study. The increases in the overall accuracy were 2.9%, 2.9%, and 3.0% for the SVM, RF, and ANN algorithms, respectively.

Simulation of the resolution of satellite-based imagery
In order to assess the accuracy of images with a different spatial resolution for identifying banana Fusarium wilt, the original UAV imagery was chosen to resample to generate images with 0.5 m, 1 m, 2 m, and 5 m resolution. RF classifier was used because it was superior to SVM and ANN classifiers for banana Fusarium wilt identification in general. Table 4 lists the results of the identification accuracy of locations of infested or non-infested plants using the RF algorithm with different resolution imagery. The results showed that the overall accuracy and Kappa coefficient decreased with a decrease in the resolution. When the imagery with the red-edge band was used, the overall accuracies for the 0.5 m, 1 m, and 2 m resolution were 87.0%, 84.7%, and 84.6%, respectively and the Kappa coefficients were 0.71, 0.67, and 0.66. The overall accuracy and Kappa coefficient were lowest for the 5 m resolution imagery, with an overall accuracy of 70.6% and a Kappa coefficient of 0.41. The results also showed that the inclusion of the red-edge band increased the identification accuracy of banana Fusarium wilt for the different resolutions. The increases in the overall accuracy were 1.9%, 1.7%, 1.8%, and 4.7% for the 0.5 m, 1 m, 2 m, and 5 m resolution imagery, respectively.

Mapping the distribution of banana Fusarium wilt
Based on discriminant models of banana Fusarium wilt established by SVM, RF and ANN algorithms in different resolutions, the spatial distributions of banana Fusarium wilt infected regions in the study area were mapped. Figure 2 shows the maps of the spatial distribution of banana Fusarium wilt infected regions using SVM, RF and ANN classifiers with the red edge band. As can be seen in Figure 2, all the maps presented similar distributions trend with regard to the occurrence of banana Fusarium wilt disease. The results in Table 5 show the areas of the healthy regions and Fusarium wilt infected regions for the input data with the red-edge band, the areas of Fusarium wilt disease were in the range of 5.21-5.75 hm 2 , accounting for 36.3%-40.1% of the total planting area of bananas in the study area. Figure 3 shows the maps of banana Fusarium wilt infected regions obtained from imagery with 0.5 m, 1 m, 2 m, and 5 m resolution. It is observed that the maps exhibit the same overall distribution of banana plants infected with Fusarium wilt. However, the maps with 0.5, 1, and 2 m resolution (Figures 3a to 3c) show local details better than the map with 5 m resolution (Figure 3d).

Discussion
The results of this study indicated that the SVM, RF, and ANN classifiers used on the UAV-based multi-spectral imagery have good potential to identify and map banana Fusarium wilt disease. All classifiers yielded similar classification results with an overall accuracy higher than 90%, but the RF algorithm required less training time. The training time of the SVM and ANN were 11.6 and 2.0 times longer, respectively than that of the RF. The high computational cost is a bottleneck of SVM applied to large scale problems [61] . Some studies have shown that ANNs have a reputation for being hard to use and to optimize, which is true of most implementations that require the user to set all parameters [26,33] . Therefore, the comprehensive performance of the RF was superior to those of the SVM and ANN for identifying banana Fusarium wilt disease within acceptable accuracy. Some researchers have also demonstrated that the RF requires the setting of fewer parameters and is faster to implement than the SVM and ANN classifiers [26,33,62,63] .
We also simulated the most common resolutions of satellite-based imagery (such as 0.5 m resolution WorldView series imagery, 1 m resolution GF-2 imagery, 2 m resolution GF-6 imagery, 5 m resolution RapidEye imagery) to assess the effects of imagery with different spatial resolution on the identification accuracy of banana Fusarium wilt disease. The results showed that imagery with a spatial resolution higher than 2 m had better classification accuracy (overall accuracy >80%). When the resolution decreased to 5 m, the classification accuracy decreased to 70.6%. This might be related to the plant spacing of bananas (2.0 m by 2.6 m) in the area. When the resolution is higher than 2 m, the image pixels can be considered as pure pixels or nearly pure pixels, i.e., a single pixel corresponds to a single spectral signature. When the resolution is 5 m, the image pixels are mixed pixels, i.e., a single pixel contains several banana plants and has a mixture of spectral features. Therefore, satellite-based imagery with a resolution higher than 2 m has good potential for identifying and mapping banana Fusarium wilt disease.
The results showed that the inclusion of the red-edge band resulted in 2.9%-3.0% increases in overall accuracy for the 0.08 m resolution imagery. This is attributed to the fact that the leaf chlorophyll content of the banana plants decreases significantly as the infection of Fusarium wilt progresses [64] , and the red-edge region is highly sensitive to changes in chlorophyll [16,65] . The changes in the red-edge band have been used as an indicator of vegetation stress [9] . However, the imagery obtained by the MicaSense RedEdge M TM sensor only has five spectral bands, which cannot fully reveal the differences in spectral characteristics between healthy and diseased plants. Hyperspectral data should be used for further studies on the sensitivity of certain bands to Fusarium wilt of bananas. In addition, differences in the spectral characteristic between Fusarium wilt and other yellowing phenomena caused by other stresses (i.e., drought stress and nutrition deficiency) should also be examined.

Conclusions
This study evaluated the performance of SVM, RF, and ANN classifiers used with UAV-based multi-spectral imagery to identify the locations that were infested or not infested with banana Fusarium wilt. The results showed that the SVM, RF, and ANN classifiers were well suited to identify and map banana Fusarium wilt with UAV-based multi-spectral imagery.
The overall accuracies of the SVM, RF, and ANN were 91.4%, 90.0%, and 91.1%, respectively, for the pixel-based approach. The RF algorithm required far less training time than the SVM and ANN algorithms. The maps generated by the SVM, RF and ANN had similar distributions trend with regard to the occurrence of Fusarium wilt disease. The areas of occurrence of Fusarium wilt disease were in the range of 5.21-5.75 hm 2 , accounting for 36.3%-40.1% of the total planting area of bananas in the study area. The results also showed that the inclusion of the red-edge band resulted in increases in the overall accuracy of 2.9%-3.0% for the 0.08 m resolution imagery. A simulation of the resolutions of satellite-based imagery (i.e., 0.5 m, 1 m, 2 m, and 5 m resolutions) showed that imagery with a spatial resolution higher than 2 m resulted in good identification accuracy of Fusarium wilt. The results of this study indicate that the RF has good potential for identifying and mapping banana Fusarium wilt disease from UAV-based remote sensing imagery; this provides guidance for disease treatment and crop planting adjustment.