Automatic sweet pepper detection based on point cloud images using subtractive clustering

: Automatic identification and detection of fruit on trees by machine vision is the basis of developing automatic harvesting robots in agriculture. The occlusion of branches, leaves and other fruits in canopy images will affect the accuracy of fruit detection. To provide a scientific and reliable technical guidance for fruit harvesting robots, a method using point cloud images was proposed in this study to detect red fruits to overcome the impact of occlusion on detection. Firstly, the fruit regions were segmented from a tree’s point cloud by applying the color threshold of red and green. Then, the noise in fruit point clouds was removed with sparse outlier removal. Finally, the point cloud of each fruit was detected and counted based on the subtractive clustering algorithm. For the sweet pepper dataset, the true positive rate (TPR) is 90.69% and the false positive rate (FPR) is 6.97% for all fruits that are at least partially visible in the scene.

and diversity of the natural environment, such as light changes, canopy structure, fruit color, occlusion, bring difficulties to the accurate detection of fruits [3] . To improve the accuracy of fruit detection, various image sensors and image analysis methods have been used for fruit detection gradually.
Black/White (B/W) camera was used as an image sensor for fruit detection in earlier studies. Edan et al. [4] applied two B/W cameras to acquire melon images under controlling illumination, the detection accuracy of 82.0%-88.0% was achieved by using geometric and texture features extracted from B/W images. Color is one of the most prominent features to distinguish fruit from the background, B/W camera cannot get color information of an image, and hence, it is difficult to ensure the accuracy of fruit detection in natural scenes. Lu et al. [5] developed a machine vision system consisting of a color CCD camera and computer to segment fruits and branches and achieved high precision.
Currently, the RGB camera is the most widely used image sensor in the fruit harvesting robot. The RGB camera can simultaneously obtain images at three channels (R/G/B), hence, the color feature, geometry and texture features can extract from RGB images. Tabb et al. [6] proposed a background modeling method to detect apples in RGB images. Kurtulmus et al. [7] used color cameras to obtain peach datasets, which were detected by statistical classifiers and neural networks. Khoshroo et al. [8] developed an algorithm to distinguish red tomatoes using image processing techniques based on color information. Kuang et al. [9] proposed a novel approach for multi-class fruit detection using effective image region selection. Histogram of oriented gradient features, texture features, and color features were utilized to improve the detection accuracy. Sa et al. [10] applied deep convolutional neural networks to detect fruits to build an accurate, efficient, and stable fruit detection system. However, the training of the model requires a lot of pictures, and it also takes much time to label these images. In addition, the information provided by the RGB image is limited, color cameras usually cannot achieve better results when the color of the fruit is close to the background.
As the demand for fruit detection continues to increase, the fruit detection method based on RGB image is difficult to meet the requirements. Many scholars have integrated the results of multiple sensors to improve fruit detection. Shamshiri et al. [11] reviewed the configurable, modular prototype robot system provided by the simulated workspace in the virtual environment, through simple testing and debugging of control algorithms, it can adapt to various field conditions to accelerate the commercialization of real robots.
Wang et al. [12] used a stereovision camera to locate the target in global coordinates for repeated counting of multiple images. Xiang et al. [13] proposed a clustering tomato recognition method based on depth images, and the images were obtained by a stereo matching technology. However, due to a large amount of computation of stereovision technology, it is often unsuitable for real-time systems. Maturity grading is important for the quality of fruits, Wei et al. [14] built a multispectral indexes system to identify the maturity with the hyperspectral technique. Because of the influence of light conditions, the sensor is unsuitable for fruit detection in outdoor environments. Underwood et al. [15] developed a mobile terrestrial scanning system for almond orchards that can efficiently estimate yield and fruit distribution for individual trees, and the visual system was based on lidar and camera sensors. Stein et al. [16] presented a novel multi-sensor framework that can efficiently identify, localize every piece of fruit in a mango orchard. A novel lidar component automatically generated the image for each canopy and used a multiple viewpoint approach for occlusion problems. In order to detect immature green citrus, Gan et al. [17] created a new color-thermal combined probability algorithm to fuse information from the color and thermal images to distinguish fruit from the background. However, sensors such as lidar and thermal imaging cameras are more expensive, it is usually uneconomical to apply these sensors to the field of fruit detection.
In recent years, consumer-grade depth cameras have been favored by more and more scholars. This kind of camera can simultaneously acquire color images and depth images of the target scene. Compared with color cameras, we can obtain the position information of the target, and then calculate the three-dimensional geometric features of the object, which are significantly helpful when applied to fruit detection. Gongal et al. [18] used a depth camera in conjunction with an RGB camera to identify duplicate apples visible in images captured from two opposite sides of the tree canopy. The accuracy of this method was 87.0% in identifying duplicate apples. Nguyen et al. [1] developed an algorithm for detecting apples on trees using an RGB-D camera. The Euclidean clustering algorithm was applied to the point cloud to extract clusters for each apple. Tao et al. [19] proposed an apple recognition method based on point cloud data, which is compared with different 3D descriptors and other classical classifiers to obtain recognition results and horizontal comparison results. The results showed that the method had a better performance, but improvements should be made for accurately determining the condition of fruit occlusion. Qureshi et al. [20] proposed two new mango tree canopy image automatic counting methods for accurately segmenting and detecting fruits in canopy images, one using texture-based dense segmentation and another using shape-based fruit detection. For images collected under the same conditions as the calibration image, the estimated number of fruits was within 16% of the actual fruit number. However, the results were poor when the model was used to estimate the number of fruits with different canopy shapes and when using different imaging conditions.
Since there are usually occlusions and clustering between fruits in natural environments and the shape of many fruits is irregular, methods based on fruit shape detection in previous literature are not effective in some situations. In order to overcome these problems, a novel method was proposed to detect and count sweet peppers based on subtractive clustering, which stays robust despite the influence of fruit shape and occlusion.
The ultimate goal of this research is to achieve fruit recognition under natural conditions. Specifically, the proposed method is utilized to detect sweet pepper fruits in three-dimensional point cloud scenes.

Datasets
An open dataset, called sweet pepper dataset [21] , was used for evaluating the performance of the proposed method. An RGB-D camera (Intel Real Sense F200, Intel company, USA) mounted on the robotic arm was used to collect RGB-D data of sweet pepper. The data collection was conducted over 10 days within a protected farming system. After data collection, the reconstruction of a dense sweet pepper point cloud from multiple views is implemented using the Kinect fusion [22] , and the statistical outlier remover and voxel grid down sampler supported from Point cloud library (PCL) [23] were used for data de-noising and filtering. Figure 1 shows the 3D models of the point cloud scene. The more detailed dataset information can be found in Reference [22].

Methods
The color and geometric information of the 3D sweet pepper point cloud were used. According to the algorithm, the fruit regions were segmented by applying the color threshold of R-G first. Then, the noise was removed by an outlier filtering algorithm. Finally, the point cloud of each sweet pepper was processed based on the subtractive clustering algorithm. The automatic method was compared to the visual manual counting, where all the images were counted by human inspectors.

Color Filtering
In order to achieve accurate recognition and localization of fruits, it is necessary to segment the fruit regions from the background first. The color feature is an important attribute to distinguish fruits from the background.
The segmentation method-based color threshold was widely used. In this paper, the R-G color difference segmentation method was chosen to perform background separation on 3D point cloud.
The fruits segmentation method is as follows: where R s and G s are the red and green information of the point cloud s, respectively. And δ is the segmentation threshold. When R s −G s is greater than δ, the point is recognized as fruit point cloud data, otherwise, it is recognized as background point cloud data. After segmentation, only the fruit point cloud data is retained, and the background point cloud data is removed, which can provide support for the subsequent three-dimensional recognition of fruits.

Sparse outlier removal
In the fruit region point cloud data after color filtering, there are often some outlier noise points that are far from the target object. Additionally, measurement errors lead to sparse outliers which corrupt the results even more. Therefore, the sparse outlier removal module is used to correct these irregularities [24] .
Firstly, for fruit point cloud data, the average distance d l from each point p l (x l , y l , z l ) (l=1,2,…,n) to all its neighbors p m (x m , y m , z m ) (m=1,2,…,k) is calculated as: Then, the mean value μ and standard deviation σ of d l are calculated as: where, the value of α depends on the size of the analyzed neighborhood, and it is 0.6 in this paper.

Subtractive clustering
Since there is interference with occlusion and aggregation in the fruit point cloud, methods such as RANSAC algorithm based on shape fitting is not suitable, which easily incorrectly recognizes the occluded adjacent two fruits into one. For these point cloud groups in 3D space, the clustering algorithm can work well.
Data clustering divides the data set into several different groups, and the similarity within the groups is greater than the similarity between the groups. It is generally to know the number of clusters in advance based on such as K-means algorithm. However, the estimation of fruit quantities in advance in a natural scene is unrealistic. For this reason, a subtractive clustering algorithm [25] is utilized in this research to detect every fruit by finding high-density regions in 3D space. The flow chart is shown in Figure 2. Fruit point clouds in the 3D space are roughly modeled by Gaussian function. The 3D coordinates of all point clouds are then considered as candidate clusters centers. Thus, each point p i with coordinates (x i , y i , z i ) is potentially a cluster center whose density D i is given by the following equation: where, N represents the number of 3D points within the neighborhood defined by the radius R a =(R ax , R ay , R az ), in this paper, R ax =R ay =R az =0.02 m. The shape of the cluster can then be appropriately adjusted based on the selection of parameters R ax , R ay , R az , which are related to the actual dimensions of 3D. Obviously, candidates p i surrounded by more points contained in the defined neighborhood will show a high value of D i . Points at a distance well above the radius defined by R a hardly affects the value of D i . Equation (5) is computed for all 3D points measured after sparse outlier filtering. Where p cl (x cl , y cl , z cl ) represent the point dominated by D cl that represents the maximum density. This point is selected as the cluster center for the current iteration of the algorithm. The density of all points D i is then corrected based on p cl and D cl . For this purpose, calculate the subtraction represented in Equation (6) for all points.
where, parameters R b =(R bx , R by , R bz ) define the neighborhood. Due to these data points that cannot become the next cluster center, the density of data point close to the first cluster center will be reduced.
To prevent clustering centers from being closer, parameters (R bx , R by , R bz ) are usually larger than (R ax , R ay , R az ).
After the subtraction process, the density of the point near p cl is correspondingly reduced according to the distance from the point to p cl . After the correction of densities, a new cluster center p cl,new is generated, corresponding to the new density maximum D cl,new , and the above process will iterate until Equation (7) is not satisfied.
where, T min is the experimentally tuned parameter which can be adjusted to define the termination condition based on the relationship between the previous cluster density and the new one. When the density of the candidate cluster center is greater than T min times of the density of the previous cluster center and the distance between the cluster center and the previous cluster center should be less than r, which prevents secondary recognition of the same fruit. Therefore, the size of r depends on the size of the fruit. For the test of sweet pepper, r is taken as 0.04 m. If the point based on this density satisfies the above two requirements, then the cluster center is accepted. If the density of the candidate cluster center is less than T min times of the density of the previous cluster center, it is considered not to be the cluster center.

Color filtering and sparse outlier removal results
The detection method was tested on the sweet pepper dataset, but since the amount of data in the point cloud scene in this dataset was too large, the sweet pepper was downsampling before it was detected by the algorithm. Sparse the point cloud scene and the sampling result are shown in Figure 3.
After obtaining the sparse data of the fruit trees, the color cloud threshold segmentation method of R-G was used to perform point cloud segmentation on the fruit and the background. Figure  4 shows the segmentation results of the fruit tree point cloud at different thresholds.  b. Figure 4 Segmentation results of fruit tree point clouds under different thresholds.
As can be seen from Figure 4, after the color threshold segmentation, the fruit and background were well separated to obtain point cloud data for the fruit region. From the point cloud part circled by the yellow mark, the segmentation effect was slightly different with the change of δ. When δ changed from 20 to 50, the segmentation effect was basically the same. When δ was greater than 50 and approached 60, 70, it can be observed that the number of original point clouds in the yellow circle had a significant decrease trend, indicating that the threshold had passed. The assembly caused the filtration of normal fruit point clouds. Therefore, in this experiment, for the segmentation threshold between the red bell pepper and the green background, it was set between 20 and 50, and when identifying other fruits, different adjustments were made according to experience.
In order to improve the efficiency and accuracy of clustering, the point cloud of the fruit region was first filtered and denoised. Figure 5 shows the point cloud processed by sparse outlier filtering. By filtering out the unstable noise points deviating from the main body, more accurate and effective fruit point cloud data were obtained. Figure 5 Filtering results after sparse outlier removal

Subtractive clustering results
The subtractive clustering algorithm was verified on the processed data and the effectiveness of the algorithm accuracy was identified. The results of the qualitative analysis are shown in Figure 6, where the green point cloud label represents the recognition for sweet pepper. The recognition results based on the filtered point cloud data in the front view are shown in Figure  6a. The recognition results based on the filtered point cloud data in the top view are shown in Figure 6b. Figure 6c shows the effect picture of the green label in the original scene, where some labels are occluded in the point cloud and invisible.
It can be seen from Figure 6 that there are different degrees of occlusion in the sweet pepper, that is, the marked area, for the total of 12 sweet peppers, 8 of which had more than 50% occlusion area of the total area. Of the 12 fruits, 11 fruits were identified, and the cluster center was marked with a green label.
To quantify the performance of our fruit recognition method, true positive rate (TPR), false positive rate (FPR) and the false negative rate (FNR) were used to describe the correct detection accuracy of sweet peppers in the scene. c. Figure 6 Results of the qualitative analysis based on the proposed method If the green clustering label was at the location of sweet pepper, then the detection was considered a true positive; otherwise, the detection was a false positive. Each detection for a fruit was treated as unique: if multiple detections locating the same fruit, one of them was regarded as the true positive and the others were considered false positives. A false negative represented a ground truth position that had no corresponding detection. The specific recognition results are shown in Table 1. The point plot of the recognition results is shown in Figure 7.
In Table 1, the recognition result of the fruit detection algorithm for the sweet pepper data set analyzed in Section 3.2 was summarized. For this data set, the TPR was 90.69% of all sweet pepper that were at least partially visible in the scene. Analysis of unapproved fruits showed that these were highly occluded fruits with fruits at the edges of the image and only a small portion of the surface visible, and the FPR was 6.97%.  The RANSAC [26] algorithm was also used in the experiment to detect the dataset and compared it with the detection results of the proposed algorithm. RANSAC calculates the mathematical model parameters of the data based on sample data sets containing outliners and obtains an algorithm for valid sample data. The RANSAC algorithm is often used in computer vision. For example, solving the matching point problem of a pair of cameras and the calculation of the basic matrix simultaneously in the field of stereo vision. The detection result is shown in Figure 8 where the original point cloud image is the same as Figure a.
The recognition result can be seen in Figure 8, and 10 of the 12 sweet peppers were detected. The most important reason was that the two missing point clouds were fitted into one for the error detection of the adjacent occluded point cloud. In Table 2, the recognition results of the RANSAC algorithm were summarized. a.
b. Figure 8 Qualitative results based on RANSAC algorithm from different perspectives Based on the RANSAC algorithm, a TPR of 86.05% was obtained, which was worse than the TPR of 90.69% based on the proposed algorithm. The FPR of 3.49% in RANSAC was superior to the one based on subtractive clustering, which was due to the repeated detection of the same fruit. But repeated recognition is also the right recognition of fruit, and has no negative impact on fruit picking. So, a slightly higher FPR had not too much effect on the final identification of fruit. In summary, the proposed algorithm was better than the RANSAC algorithm for a complex environment.

Discussion
Color filtering, sparse outlier removal and subtractive clustering were used to detect fruits in complex orchard environments. Finally, the detection results of 90.69% TPR and 6.97% FPR were obtained for the sweet pepper data set. After compared with the RANSAC algorithm, the superiority of the proposed algorithm can be verified.
For recognizing the fruit with severe occlusion, the radius parameters are set a little to adapt the size of the small fruit point cloud. Therefore, the results of the algorithm may show two cluster centers for large-sized fruits, which leads to an increase in FPR, so future research will focus on changing the algorithm to parameter adaptation. As for the FPR, the result of 6.97% is slightly higher, but the repeated detection of the fruit does not actually affect on the fruit picking, but only interferes with the yield estimation of the fruit. Therefore, the final test results are generally satisfactory.
Due to the use of the color threshold segmentation, the algorithm proposed is limited to detecting red and bicolored fruits. For detecting green apples, the filtering strategy should be improved. For example, the method of machine learning can classify color regions in a point cloud, and the method of spectral images can identify the target by extracting spectral features.
Because the algorithm is based on the sweet pepper point cloud dataset, it can be applied to larger fruits. However, there are many fruit clusters in the natural scene, especially the bunch of fruits like grapes. It is difficult to detect them for the proposed algorithm.

Conclusions
In this study, an algorithm was proposed and verified for sweet pepper detection which provides scientific and reliable technical guidance for sweet pepper harvesting robots. Firstly, the fruit regions were segmented from the original point cloud by applying the color threshold. Then, the noise in sweet pepper point clouds was removed with a sparse outlier removal method. Finally, the point cloud of each sweet pepper was detected and counted based on the subtractive clustering algorithm. For the sweet pepper dataset, the 90.69% TPR and the 6.97% FPR were obtained for all fruits that are at least partially visible in the scene. Undetected sweet peppers showed that these were highly occluded fruits which could be detected more accurately from another angle.