Spectral difference analysis and identification of different maturity blueberry fruit based on hyperspectral imaging using spectral index

Hyperspectral imaging, with many narrow bands of spectra, is strongly capable to detect or classify objects. It has been become one research hotspot in the field of near-ground remote sensing. However, the higher demands for computing and complex operating of instrument are still the bottleneck for hyperspectral imaging technology applied in field. Band selection is a common way to reduce the dimensionality of hyperspectral imaging cube and simplify the design of spectral imaging instrument. In this research, hyperspectral images of blueberry fruit were collected both in the laboratory and in field. A set of spectral bands were selected by analyzing the differences among blueberry fruits at different growth stages and backgrounds. Furthermore, a normalized spectral index was set up using the bands selected to identify the three growth stages of blueberry fruits, aiming to eliminate the impact of background included leaf, branch, soil, illumination variation and so on. Two classifiers of spectral angle mapping (SAM), multinomial logistic regression (MLR) and classification tree were used to verify the results of identification of blueberry fruit. The detection accuracy was 82.1% for SAM classifier using all spectral bands, 88.5% for MLR classifier using selected bands and 89.8% for decision tree using the spectral index. The results indicated that the normalization spectral index can both lower the complexity of computing and reduce the impact of noisy background in field.


Introduction 
Blueberry, containing an abundance of wellbeing benefits, is hailed as the most promising fruit in the 21 st century.For human being, blueberry can provide about 25% of daily demands of vitamin C, vitamin A and some other micronutrients, which is more than other fruits such as apple, orange and grape [1] .In the production of blueberry, the harvesting is very labor intensive because the blueberry fruit is small, delicate and cluster growing.The fresh blueberry fruit are mostly hand-harvested since the fruit in a same branch do not ripen at the same time in the harvest season [2] .And because of the short window of harvest season and high yield of blueberry which drive the farmers to invest more human labor, the cost of harvesting usually exceeds 50% of the total cost of the production.So, as the cost of labor rising rapidly, blueberry identification and yield estimation in advance of harvesting season is beneficial to help farmers to better plan harvesting and reduce harvesting cost [3] .
Near-ground remote sensing is a common method of detecting or classifying objects without touching or breaking them.Based on digital imaging processing, many methods were proposed to detect fruit and estimate yield [4][5][6][7][8] .For example, Zaman et al. [9] collected the RGB image of blueberry in field using a camera which was fixed at the height of one meter above ground.Detection accuracy of the proposed method of blueberry yield estimation is about 99%.But the relative high prediction accuracy was only suitable for the mature fruit because of its significant color hugely contrasting to background.For the green fruit detecting, "eigenfruit" feature was proposed by combining color features with Gabor texture features to identify immature citrus fruit, of which the accuracy reached 75% [10] .Sengupta and Lee used candy operator and Hough algorithm to detect edge of the citrus, which is better for identifying the occluded fruit [11] .
The digital image is easy to collect, but it is difficult to distinguish the green fruit from the noisy background.Hyperspectral image, with more spectral and spatial information, gave another way to target detection.Kane et al. [12,13] , Safren [14] , and Okamoto [15] had conducted several studies successively to detect green citrus or apple in field using hyperspectral imaging.However, the redundant information was still the bottleneck of hyperspectral using in field.Due to the high spectral resolution, hyperspectral images usually contain hundred bands of wavelength, and some of them are useless.Band selection is a solution of reducing the dimension of image cube and the selected original bands can also be used to build multispectral camera system to simply the process of image collecting.For blueberry fruit detecting, Yang et al. [16] built a hyperspectral imaging system and selected a set of optimum bands to identify blueberry fruit of different growth stages using feature extract method.Results showed that the detection accuracy was more than 88% based on the bands selected.
Spectral index is another method to reduce the dimension of hyperspectral image cube.Based on remote sensing images, the classification accuracy is determined by the complex background, such as sunlight, soil, atmospheric moisture, etc.Although target pixels can be collected using image segmentation, spectral information which was used in modeling was still the mixed information of target and background [17,18] .A research indicated that it was great different at near-infrared band between spectral information of pure crop and target with soil background in the images [19] .Shuaibu et al. [20] used anthocyanin reflectance index and improved triangle vegetation index to identify marssonina blotch disease, yield an accuracy of 99.2%.And recently, many researchers [21][22][23] also studied the spectral index used in classification and recognition.Therefore, it can be concluded that spectral purification based on spectral index has an important significance for target detection in agricultural remote sensing.
The objectives of this study were to analyze the spectral difference of different growth stages blueberry fruit and background (such as soil, sky, leaf and branch), to select useful bands which are suitable for multispectral imaging system, and to design some spectral index to improve the result of classifiers.

Hyperspectral image acquisition
Hyperspectral images were collected using a near-ground hyperspectral imaging system in University of Florida, USA.The imaging system was constructed with a hyperspectral camera (Imspector V10E, Specim, Finland), a central processing unit (laptop, E6500, DELL, TX, USA) and an image acquisition unit (NI-PCIe 6430 & NI-6036E, National Instruments Inc. Austin, TX, USA).The camera was a line-scan sensor, which contained pixels along the horizontal direction and the radiation light was dispersed into 388 spectral bands from 396 nm to 1010 nm.In field, a platform of imaging system was built using a tilting head (PT785S, ServoCity, Winfield, KS, USA) and an encoder (Omron-E6B2, Omron Cooperation, Kyoto, Japan) to accumulate a 2D image from top to bottom, as shown in Figure 1.

Figure 1 Hyperspectral imaging system
A total of 55 images of blueberry trees were acquired under a variety of illumination and weather conditions such as sunny, cloudy and windy conditions.And the digital color images were also taken with a digital camera (EOS T2i, Canon, Japan) at the same location as references.Image mostly consisted of targets (mature fruit, intermediate fruit and young fruit) and backgrounds (leaf, branch, sky and soil).The reflectance images were corrected using a universal white standard (Labshpere Inc., North Sutton, NH, USA).A total of seven images of fruit and leaf were taken in the laboratory, which was used to extract "ideal" standard spectra of fruit and leaf to create training data for classifiers.The RGB bands of one image were shown in Figure 2.

Spectral analysis method 2.3.1 First derivative
The first derivative of the original spectrum can effectively display the spectral change, which is widely used in the method of spectral difference analysis.The calculation formula of first derivative is as follow in Equation (1): where, R′ λi is the first derivative at the wavelength of λ i ; R λi is the original spectrum at the wavelength of λ i , and Δ is the spectral resolution.

Discriminability
To reduce the dimensionality of high spectral data, some bands of characteristic spectra are usually adopted to replace the whole spectrum to reduce the computational complexity.In this study, discriminant value was adopted to distinguish two groups of spectra.
For two classes of data without same standard distribution, the format of discriminant value operating is as follow: where, d′ is the discriminant value; μ 1 , μ 2 are the average of two classes of data; σ 1 , σ 2 are the variance of two classes of data.The larger the discriminant value, the greater the difference between the two classes of data.
2.4 Classifiers 2.4.1 Spectral angle mapping Spectral angle mapping (SAM) is a method of target identification by comparing the spectral similarity between the test spectrum and reference spectrum.The spectral similarity is determined by calculating the spectral angle between the two spectra [24] .Smaller spectral angle represents closer matches to the reference spectrum.In this study, the reference spectra, also called ideal spectra, were collected in laboratory, which can be considered as ideal condition.

Multinomial logistic regression
Due to superior performance in processing high-dimensional data, Multivariate logistic regression (MLR) has been widely used in classification of remote sensing hyperspectral image [25] .Multiple logistic regression classifier is a discriminant model, which calculates the probability of an event happening according to the logit function of several independent variables.MLR is an extension of the binomial logistic regression, which can be described as follow: 1 () 1 where, y is a dataset with n predictive variables, as defined in Equation (4).0 1 where, x i is the value of the i variable; β 0 is an intercept, and β i is a coefficient of variable.

Spectral analysis and spectral index 3.1 Spectral difference analysis
The regions of interest (ROI) of the reflectance image were extracted using a software of ENVI (Version 5.0, Exelis, Boulder, CO, USA).For each ROI, 30 pixels were randomly selected to calculate the average reflectance spectra for one class object.The average reflectance spectra of the eight classes object are shown in Figure 4. From the Figure 4, it is obvious that soil and sky as abiotic matter have a significant spectral difference comparing the other classes spectra.The living vegetation has a strong absorption of sunlight at the spectral wavelength of 670 nm, and almost no absorption at 915 nm.The biggest contrast between blueberry fruit and biological background (mainly leaf and branch) is that blueberry fruit has a strong water absorption band at 970 nm and the background is much weaker.The main reason is that the fruit accumulates lots of moisture to keep volume expanding in the harvesting season.Among different maturity fruit, the difference is mainly in color.So, the R (622-770 nm), G (492-557 nm), B (455-492 nm) bands are the main basis to distinguish each other.
The spectral difference is confirmed by the first derivative of reflectance spectra of blueberry, as shown in Figure 5. Ranges of spectra were selected from 415 nm to 1000 nm to decrease the impact of noise due to the low sensitivity of the camera at the beginning of the wavelength.From Figure 5, the region of intense spectral change for immature fruit is 480-550 nm, 650-750 nm and 880-970 nm.For intermediate fruit, it is 550-750 nm and 880-970 nm.For mature fruit, it is 650-750 nm and 880-970 nm.The spectral signatures in different objects were the main basis to detect targets from the background.

Figure 5 First derivative of the blueberry spectra
To clear the most significant difference spectral bands between the three different growth stages, their values of discriminability were calculated for each spectral band.From the figure, the optimum spectral bands to distinguish mature fruit and intermediate fruit are 738 nm and 627 nm.They are 743 nm and 557 nm for distinguishing mature fruit and immature fruit, 546 nm and 693 nm for distinguishing intermediate fruit and immature fruit, as shown in Figure 6.Therefore, in this study we selected seven spectral bands to identify the blueberry fruit using the MLR classifier, which is listed in Table 1.

Spectral index
Normalized Difference Vegetation Index (NDVI) is usually used to eliminate most of the changes in irradiance in the field of remote sensing.The calculation formula is as follow: where, R NIR , R Red are the reflectance at the spectral bands of NIR and red.The principle of NDVI designed is to enhance the contrast of reflectance of the two bands by nonlinear stretching.Based on that, in this study a set of spectral indexes were proposed based on the selected spectral bands in this paper.The calculation formula of spectral index was shown in Equation ( 6): where, R i , R j are the reflectance of i, j band selected.
A total of five spectral indexes were designed, listed in Table 2.The Var8 was designed because the spectral curve of soil and sky obviously differed from the other class objects in the region from 670 nm to 915 nm.The spectral curve of leaf and branch were mostly similar as the immature fruit except in the region from 915 nm to 970 nm, causing the Var9 designed.As mentioned above, the optimum bands to distinguish the three growth stages blueberry fruit were 551 nm, 627 nm and 740 nm.Thus, the last three variables were designed for classifier.

SAM classifier
It is needed to select a threshold of spectral angle between test image spectra and the reference spectra before the method of SAM used as classifier.In this study, the spectra collected in laboratory are used as reference spectra, and the data library was used to determine the threshold value of spectral angle, which is shown in Figure 7.The spectral angles were determined as 0.17, 0.10 and 0.08 for mature fruit, intermediate fruit and immature fruit respectively to make an optimum classification.

Figure 7 Selection of threshold of spectral angle between training data and reference
With selection of threshold of spectral angle, the result of identification was much better for mature fruit and poor for intermediate fruit and immature fruit.It was testified in the result of classification for the validation image data.One example of identification of three growth stages blueberry fruit is shown in Figure 8 and Table 3.In the example image, a total of 13973 pixels were blueberry fruit.Among them, there were 10 410 pixels for mature fruit, 2626 pixels for intermediate fruit and 937 pixels for immature fruit.

Table 3 Result of classification using the classifier of SAM
Figure 8a shows the RGB representation of the example image and Figure 8b shows the corresponding digital image taken at the same location.Figure 8c was the labelled images by human vision, which were used as the validation to calculate the ratios of correct identification.Figure 8d was the result of detection using the classifier of SAM. Figure 8e shows the image segmented using the method of noise removal.Figure 8f gave the final detection result.In the image, the correct identification was labelled in red color, the missing detection in green color and the false recognition in blue color.There were 85.2% identification accuracy for mature fruit, 77.8% for intermediate fruit and 59.3% for immature fruit.The accuracy for immature fruit is poor mainly because that the spectra of immature fruit are more similar with leaf and green branch.The ratios, which means the pixels were mis-recognized as targets, were 10.3% for mature fruit, 27.8% for intermediate fruit and 73.2% for immature fruit.The false recognition for immature fruit was so high probably cause that there were two regions of immature fruit missed in mask of image labelled by hand.

MLR classifier
The selected bands in Table 1 were used in MLR classifier.A set of variables from data library with the nine bands was selected as the inputs for the construction of the multinomial logistic regression model, which were shown in Equations ( 7)-( 9 where, Vari are the variables selected; x means the target detected and b mean the comparison spectra in the data library.Results of recognition of blueberry fruit using MLR were shown in the Figure 9 and Table 4. Figure 9a shows the gray images calculated using the method of MLR.From the figure, there was significant difference between the target and the other classes for mature fruit and intermediate fruit.However, the recognition result for immature fruit was still poor.It yielded 92.4% accuracy of identification for mature fruit, 82.9% for intermediate fruit and 60.4% for immature fruit.And the false recognition ratios were 9.1%, 8.9% and 70.3% respectively.Comparing with the results using the method of SAM based on the whole region of spectra, the method of MLR yielded a higher accuracy of identification and lower ratio of false recognition.It indicated it is feasible to detect target using some selection bands in place of the whole spectra based on hyperspectral imaging, which reduced the complexity of computing and made it possible to use multispectral imaging technology in field.

Decision tree
The spectral index, listed in Table 2, was designed to enhance the contrast of target and background, aiming to improve the result of recognition.All the five spectral indexes were used as input variables to train a classification model.The model created a decision tree based on the principle of binomial classification tree.Figure 10 shows the images of the four spectral indexes.From the figure, Var8 gave a clear different between the soil & sky class and the other classes.Var9 was useful to distinguish the fruit target and the background except soil and sky.In the image of sum of Var10 and Var11, the mature fruit was outstanding in the dark region, where also contained soil & sky class.In the last, Var12 gave the difference between intermediate fruit and immature fruit, shown in Figure 10d.So, based on the variables, a decision tree was constructed as a classifier, shown in Figure 11.The parameters, usually mean the thresholds of variables, were trained with training data calculated from the data library.In the decision tree, soil and sky classes were wiped off by the first node of the tree constructed with the variable of Var8.And then, the leaf and branch classes were wiped out and the fruit regions were picked out at the second tree node.Finally, based on Var10, Var11 and Var12, the mature fruit, intermediate fruit and immature fruit were identified.5 show the results of recognition for the three growth stages blueberry fruit.Figure 12a gave the regions of fruit recognized with a threshold.And Figure 12b shows the conclusive results of detection.There are 93.3%correct a. Regions of target pixels b.Result of recognition Figure 12 Recognition result using the decision tree identification ratio for mature fruit, 85.1% for intermediate fruit and 63.8% for immature fruit.The recognition accuracy was a bit higher compared with the other two classifiers.But for the immature fruit, the results were not significantly improved.So, it is possible to tap the potential of data by constructing the spectral index.But for two very similar targets recognizing in a complex background, mixed information from multiple sensors should be introduced into the field of remote sensing, to solve the problem in Agriculture.

Conclusions
Fruit of shrubs identification is always a problem in the field of remote sensing because of the small size of fruit and complex background.In this study, the methods of blueberry fruit recognition were studied with a hyperspectral image system.The major finding of this study can be summarized as follows.
(1) The study discussed the way of collecting and preprocessing the hyperspectral images.And based on that, the spectral differences of blueberry were analyzed in detail.Based on the differences of blueberry components at different growth stages, a set of optimum wavelengths was selected to make a distinguish.They are 551 nm, 627 nm, 670 nm, 715 nm, 740 nm, 915 nm and 970 nm.
(2) The spectral index was designed to be used in classifier.Based on the bands selected, five spectral indexes were used to separate the fruit and background step by step in the model of decision tree.Result show that it is possible to enhance the potential of original data.

Figure 2
Figure 2 RGB bands of example hyperspectral image with three blueberry fruit growth stages2.2Building data libraryFigure3shows a digital image of blueberry with three different growth stages fruits and five kinds of background.The eight different classes mostly found in all of images taken in field are: mature fruit (blue or purple), intermediate fruit (red or pink), immature fruit (green), leaf, green branch, brown branch, soil and sky.Among the 55 images taken in field, 30 images were randomly selected as training images and the other 25 were used as validation images.An image editing software (Corel Painter Photo Essentials 4, Ottawa, Ontario, Canada) was used to crop the training images and extract the pixels of eight classes of objects.A data library of eight classes of different objects was built with 2400 pixels and 300 pixels for each class.A total of 1200 pixel were randomly selected for training the classifier and the other pixels were used to validate the model.

Figure 3
Figure 3 Example digital image of blueberry with different growth stages fruit and background objects

Figure 4
Figure 4 Reflectance spectra of different fruit maturity stages and background

Figure 6
Figure 6 Discriminability value of the three different growth stages blueberry

Figure 8
Figure 8 Result of classification for three growth stages of blueberry fruit 4.2 MLR classifierThe selected bands in Table1were used in MLR classifier.A set of variables from data library with the nine bands was selected as the inputs for the construction of the multinomial logistic regression model, which were shown in Equations (7)-(9).

Figure 9
Figure 9 Recognition results of blueberry fruit based on the method of MLR using the bands selected

Figure 10 Figure 11
Figure 10 Gray images of the four designed spectral indexes

Table 2 Spectral index designed and description
).