Autonomous detection of crop rows based on adaptive multi-ROI in maize fields

Crop rows detection in maize fields remains a challenging problem due to variation in illumination and weeds interference under field conditions. This study proposed an algorithm for detecting crop rows based on adaptive multi-region of interest (multi-ROI). First, the image was segmented into crop and soil and divided into several horizontally labeled strips. Feature points were located in the first image strip and initial ROI was determined. Then, the ROI window was shifted upward. For the next image strip, the operations for the previous strip were repeated until multiple ROIs were obtained. Finally, the least square method was carried out to extract navigation lines and detection lines in multi-ROI. The detection accuracy of the method was 95.3%. The average computation time was 240.8 ms. The results suggest that the proposed method has generally favorable performance and can meet the real-time and accuracy requirements for field navigation.


Introduction
The progress of agricultural science is the most important indicator to measure the productivity of modern agriculture [1] . Intelligent agricultural equipment can reduce farmers' work intensity and improve their work comfort, while increasing efficiency and quality. In recent years, scholars have conducted various studies on intelligent operating systems for agricultural machinery.
In different areas, a number of studies have been conducted to develop intelligent agriculture [2][3][4] . Among them, agricultural navigation technology usually relies on a global positioning system (GPS) for autonomous driving of agricultural machinery in the field [5][6][7] . However, in maize fields, GPS-based navigation systems can hardly ensure that vehicles travel between crop ridges, resulting in a high rate of seedling damage from wheels. More recently, machine vision has become increasingly popular [8,9] . As most crops are planted in rows, the navigation path of agricultural vehicles can be planned by crop rows detection.
Therefore, the field navigation of agricultural vehicles based on machine vision relies mainly on the development of crop rows detection algorithms [10] . With the crop rows detection based on machine vision, it is possible to ensure that the vehicles travel between the crop ridges to avoid crushing seedlings.
Though machine vision exhibits advantages such as low cost, large information capacity, crop row detection usually relies on complex algorithms.
Therefore, field navigation based on machine vision is unstable at times due to long computation time, and susceptibility to environmental changes. Scholars have conducted many in-depth pieces of research on crop row detection. In recent years, many effective methods are proposed. The popular methods are mainly classified into the following categories: 1) Methods based on Hough transformation: Hough transform [11] is one of the most popular methods of straight-line detection. It is typically used to separate geometric shapes with the same characteristics from others. Rovira-Más et al. [12] used Hough transform to detect crop rows by extracting the region of interest. After that, the most suitable travelling path was found by the connectivity analysis. Bakker et al. [13] first corrected the image which was segmented into crop and soil based on grayscale image subsequently. And the grayscale image was divided into three parts to compute Hough transform. Finally, the three parts were combined into one image. This method solves the problem of difficulty in establishing the region of interest due to the non-parallelism of crop rows. However, its adaptability is limited because it needs to be based on a fixed camera angle. The disadvantage of Hough transform is the long computation time due to the high computational volume. Ji and Qi [14] presented a crop rows detection method. It is based on randomized Hough transform (RHT) [15] . After that, they tested a color image. Compared with classical Hough transform, the computation time was significantly reduced. Randomized Hough transform reduces the computation time to a certain extent, but its adaptability is still limited in maize fields with high weeds pressure.
2) Methods based on horizontal strips: One of the difficulties in crop rows detection is that the crop rows in the image are not parallel. Okamoto et al. [16] divided the corrected image with perspective transformation into five horizontal image strips. Feature pixel values were projected vertically to locate the center points for crop rows detection. This method is difficult to adapt to field images obtained with different camera angles.
To address this problem, Sogaard and Olsen [17] divided the image into several horizontal strips and positioned crop rows in each grayscale image strip without using a perspective transformation. This method solves the problem that crop rows are hard to be positioned. Si et al. [18] divided the image, ascending and descending points were positioned in each image strip for least squares-based crop rows detection. Ospina and Noguchi [19] detected the contours of crops in each image strip. The least squares method was used to extract navigation line based on the geometric center points of the contours. The methods above are possibly affected by weeds, partial missing of crop rows and camera shake.
3) Methods based on vanishing point: Pla et al. [20] proposed a crop rows detection method based on a vanishing point. After the image was segmented into crop and soil, the skeleton features of each crop row were extracted by vanishing point to serve as a baseline for linear fitting.
After the detection lines were computed, the vanishing points were detected to recover crop rows. Jiang et al. [21] searched for the feature points through a moving window. Hough transform was used to detect all possible crop rows. Finally, the vanishing point was computed by k-means clustering. The method based on the vanishing point can adapt to different field conditions. However, complex maize field conditions make skeleton extraction hard. Moreover, the method requires high computing power, and its computation time still needs to be optimized. 4) Other methods: Zhang et al. [22] carried out a vertical projection method to position the feature points, and they were clustered for linear fitting. Jiang et al. [23] used linear regression method to detect crop rows based on the region of interest, which depended on the estimation of central points. García-Santillán et al. [24] proposed a new method of curved and straight crop rows detection. Extraction of starting points, location of micro-region of interest (micro-ROI) and regression analysis are the cores of the method. Li et al. [25] determined the center position by the boundary features of crops, then clustered candidate points, and finally extracted the navigation path by Hough transform. The methods above combine the advantages of previous methods in terms of accuracy, but all candidate feature points are computed simultaneously when detecting crop rows. In the images taken by the camera, the crop rows located at the edges (non-travelling area) are not very meaningful for navigation. Therefore, the methods are susceptible to interference in the scenario of dense crop distribution, high weeds pressure, and crop rows partial missing, which will lead to longer computation time and lower accuracy in determining feature points.
According to the above methods, we found few crop rows detection methods based on selecting travelling area as ROI. Montalvo et al. [26] successfully detected crop rows with high weeds pressure by determining the ROI of images and least-squares method. But the number and distribution of crop rows were known beforehand from previous work. Establishing adaptive ROI is essential for accurate and real-time navigation. To address the problem, considered the algorithms proposed by previous scholars, this paper proposed a crop rows detection method based on multi-ROI. The objectives of the study are to be able to achieve strong adaptability in different field conditions, including different illumination, weeds pressure and camera position. The accuracy of navigation line should meet the allowable requirement of agricultural vehicle navigation. The computation time of the algorithm should be within a reasonable range and with a good real-time performance.

Materials and methods
The key insight of the method presented in this study is to extract the travelling area as the region of interest (ROI). The crop rows are detected in ROI as shown in Figure 1. And the flow chart is presented in Figure 2. First, an initial midpoint was set, and the image was divided into several horizontally labeled image strips. The feature points were positioned in the first strip to determine the initial ROI. Then, the ROI window was shifted upward and the initial midpoint was renewed as the center of ROI. In the second image strip, the operations in the first strip were repeated based on renewed ROI and midpoint. The operations above are repeated across all image strips until multi-ROI was determined. Finally, the least square method was carried out to detect crop rows in multi-ROI.

Image acquisition and processing equipment
The images were taken at the Wanbei Comprehensive Experimental Station of Anhui Agricultural University (116°97′E, 33°63′N), Huigu Town, Yongqiao District, Suzhou City, Anhui Province, China.
For image collection, a complementary metal-oxide semiconductor (CMOS) camera was used. The camera was installed on the agricultural vehicle at 1.5 m above ground, with a 30° downward vertical angle. The image size was 1920×1080 pixels. The frame rate was 12 frames/s. The video was saved in AVI format. The video was collected on July 8, 2019 (illumination intensity: 102 300 lx) and July 14, 2019 (illumination intensity: 149 200 lx) under natural illumination conditions. The distance between rows was 60 cm, and the velocity of the vehicle was 0.5 m/s. The video contained 2 890 frames, and the growth period of maize was the three-leaf stage. The images were taken from the video of crop rows in front of the vehicle. The chassis of the vehicle traveled along the crop rows. The images are shown in Figure 3.
The algorithm was implemented in Python (version 3.6.6). LAPTOP-K8UQ8410 (Lenovo, Beijing, China) was used with an Intel(R) Core(TM) i5-8300H core processer at 2.30 GHz and 8.00 GB of random access memory (RAM). The camera communicated with a laptop through a network port.

Image preprocessing
Because of the complex environment in maize fields, the images collected by the CMOS camera contain a fair amount of interference information. In order to reduce the noise, the edges of images were appropriately cropped. And the images were segmented into crop and soil.

Image segmentation
It is critical to properly distinguish crop and soil for subsequent image processing. To do so, choosing desirable color space is the primary problem. Red (R), Green (G), Blue (B) color model is commonly used to extract green plants like crops [27][28][29] . As presented in Figure 3, there is a substantial difference between the green component of crop and soil. The Excess Green algorithm (ExG) [30] can effectively suppress weeds, shadows and weaken the influence of illumination on subsequent image processing. Before that, the R, G, and B channels need to be normalized as follows: (1) where, B, G, and R are image color components; b, g, and r are normalized values of B, G, R.
By the abundant experiment survey and data analysis, the improved ExG for each pixel is as follows: The gray-scale images of different illumination and weeds pressure are shown in Figure 4.
After the original images were converted to gray-scale images, binary images were obtained with Otsu's method [31] . Results are shown in Figure 5

Morphological processing
After the binary images were obtained, obvious impulse noises appeared in Figure 5b due to weeds. Therefore, binary images were further de-noised by a mathematical morphological open operation based on the kernel as follows: The results are shown in Figure 6. Open Access at https://www.ijabe.org Vol. 14 No. 4 a. Type-1 field (illumination intensity: 149 200 lx) b. Type-2 field (illumination intensity: 102 300 lx) Figure 6 Binary images after morphological processing

Region of interest selection
When the agricultural vehicle is traveling in maize fields, only the crop rows in the traveling area need to be selected for detection, while the crop rows at the edge of the image are less useful. In this study, travelling area is selected for crop rows detection to improve the accuracy and computational efficiency.

Image division
Because crop rows in the images are not parallel, ROI cannot be extracted by delimiting a simple geometric area. For ease of ROI selection, the image was divided into N horizontal strips with Δh height interval along the vertical direction. N is calculated as follows: where, N is 10 in this study; H is the height of image, pixel; Δh is the height of image strip, pixel. The divided image is shown in Figure 7a.

Initial ROI determination
The bottom image strip (Horizontal strip 10) was picked to determine the initial ROI. Initial midpoint M O is set to (M x0 , M y0 ), where M x0 =W/2; M y0 =H; W is the width of image, pixel. This is temporarily inaccurate. To properly convert the initial midpoint coordinate to the actual midpoint coordinate, the pixel values of each column need to be projected vertically. Afterwards, the initial ROI was determined based on the projected image strip, which proceeded as follows: The image strip was scanned, and the pixel sum (Z(i)) of each column was calculated based on Equation where, i is the column coordinate; j is the row coordinate; p(i, j) is the pixel value of coordinate p(i, j); h(i) is the number of points with 255 pixels value; a is a positive real number, a=1 in this study. The projected image is shown in Figure 7b. The projected image may be affected by the residual noise in the binary image, thus a threshold T is set for filtration. If the value of Z(i) is less than T, it was set to 0 as described in Equation (6). T is calculated as described in Equations (7) and (8).
where, M is the average value of column pixels; E is the standard deviation of column pixels. Distances between feature points (Z(i)≠0) in the projected image are recorded by scanning along the pixel column axis. The feature points are marked as candidate clustering points. An L value is set as the distance threshold for clustering. The neighboring candidate clustering points with a distance less than L are grouped together.
After the clusters are obtained, they are scanned from M O along the column axis. The column coordinates of two nearest point classes to M O are subsequently obtained and respectively stored in clustering sets C Left and C Right , as Figure 8a shown. The width of ROI is finally determined based on clustering sets, and calculated from Equation (9). The result is shown in Figure 8b

Multiple ROI (Multi-ROI) determination
The crop rows in the images gradually converge as the camera distance increases. According to this characteristic, after the initial ROI is obtained, the ROI window is shifted along the height direction with a step size of Δh. It means that the ROI window obtained in the previous steps is applied to the next image strip. The vertical projection and clustering algorithm in Section 2.3.2 are repeated based on the renewed ROI and M O . And we can determine the ROI window for each image strip in turn to extract multi-ROI. Furthermore, the field condition is usually complex and variable. If feature points cannot be detected due to partial missing of the crop rows, the previous ROI window is returned as a. Type-1 field (illumination intensity: 149 200 lx) b. Type-2 field (illumination intensity: 102 300 lx) Figure 9 Multi-ROI extraction results of the original images the new ROI window. In particular, the pixel range of the first image strip is designated as the initial ROI when no feature points are detected. The multi-ROI of the original two sets of images with different illuminations and weeds pressures are shown in Figure 9. The details of the results are shown in Figure 10.

Navigation line extraction
After multi-ROI is obtained, the navigation line is extracted through the point set Q. It should be noted that the initial midpoint is an estimated point, which has no practical significance for the extraction of navigation line. Therefore, point set Q does not include the initial estimated midpoint.
The least squares method (linear regression) is to find the best function fitting for the data by minimizing the sums of squares error. When the feature data is less, this method can lead to improvement in the speed of linear fitting. Now a set of points containing N feature points (shown as Equation (11)) is obtained by the total operations above.
Q ={(M xμ1 , M yμ1 ), (M xμ2 , M yμ2 ), …, (M xμN , M yμN )} (11) Now, the problem is to find its linear function y=f(x). It is not likely to obtain an accurate function due to the nonlinear distribution of feature points. But it can find an approach to minimize the sum of the squared deviation of the distance between f(x) and points in set Q. Suppose the equation of the regression line is y = ωx + b (12) To further obtain Equation (12), the optimal combination of ω, b needed to be found and calculated as Equations (13) and (14). Results are shown in Figure 11.

Results and discussion
After the navigation line was extracted, detection lines of crop rows were again extracted in multi-ROI by the least squares method. In order to ensure that the algorithm performs well under various conditions, the images of maize plants at the jointing stage (the height of the plant was about 70 cm) were used for testing. The results are shown in Figure 12. Compared with the three-leaf stage, the maize plant at the jointing stage is denser and taller and the leaves are severely obscured, making it difficult to distinguish the crop rows at the edge of the image. By determining traveling area as ROI, this method limits the detection in the effective range, and still has a good performance. Open Access at https://www.ijabe.org Vol. 14 No. 4 a. Linear regression of Type-1 field b. Linear regression of Type-2 field c. Navigation line of Type-1 field d. Navigation line of Type-2 field Figure 11 Results of linear regression based on least squares method a. Type-1 field b. Type-2 field c. Type-3 field d. Type-4 field e. Type-5 field f. Type-6 field Note: Blue line is the navigation line, dark blue line is the detection line. Figure 12 Results of crop rows detection (Type-1 and Type-2 are original input images, Type-3 to Type-6 are jointing stage images) To verify the navigation accuracy, drawn lines were selected for evaluation. Drawn lines were marked in strict accordance with agronomic requirements during the preparation process, choosing the most reasonable navigation path to exclude weed interference and making drawn lines and crop rows as parallel as possible. Drawn lines were marked in red as shown in Figure 14.
Error angle ∆θ is defined as the angle difference between the drawn line and navigation line. Error angle is calculated as follows: where, θ is the angle between the navigation line and the middle axle of the agricultural vehicle chassis, (°); k is the slope of the navigation line; ∆θ a is the angle of the drawn line, (°).
To further verify the reliability and real-time performance of the algorithm of this paper, 100 frames (Maize plant at three-leaf stage) from the video captured by the agriculture vehicle were randomly selected to compare the algorithm with HT, HS [18] , and that proposed by Jiang et al. [21] and Zhang et al. [22] The error angle between the drawn line and navigation line is not the only variable when accuracy is defined. In practical calculation, the accuracy is also related to the offset distance of the two lines. Even if the error angle is small, when the offset distance is high, it can still be considered that the detection result is poor as shown in Figure 13. l 1 is a navigation line with the error angle of ∆θ computed by a method, and l 2 is a navigation line computed by other methods. Obviously, the error angle of l 2 is 0°, but it does not mean that the detection result is excellent. On the contrary, because of the large distance from l a , its accuracy can still be considered poor. Therefore, this paper introduces the error angle and offset distance when the accuracy is defined.
Note: l a is the drawn line; l 1 and l 2 are the navigation lines extracted by different algorithms. Figure 13 Error analysis between navigation line and drawn line made by different algorithms As the error angle gets closer to 90°, the accuracy is considered to be lower. The width of the bottom of ROI (W bottom ) is regarded as the error range of the offset. The greater the offset distance of the navigation line, the lower the accuracy. And the maximum distance between the endpoints of the two lines at the edge of ROI (max{d 1 , d 2 }) is used as an index to measure the offset distance. From the above method, the accuracy (A) is calculated as follows: 1 2 bottom max{ , } 1 100 90 The result of the comparison is shown in Table 1. The results revealed that the average detection accuracy of HT was 78.2%. When the weeds pressure was high and crop leaves were dense, there were greater error angle values. The average error angle was 5.76°. And it came the problem of long computation time. Real-time performance was poor. Compare with HT, the average error angle of HS was smaller, but the accuracy was lower. The major reason was that the distribution of crop rows was not symmetric due to camera shake. Though HS accurately computed the navigation angle and detection lines by computing the feature points, the navigation line shifted to the side for more crop rows, which led to a decrease in accuracy. The methods proposed by Jiang et al. [21] and Zhang et al. [22] had better accuracy compared to HT and HS. The method proposed by Jiang et al. [21] is an improved method based on Hough transform, which detects vanishing points through k-means clustering to exclude wrong crop rows. But it needs to set the value of k in advance. When the vehicle travels in the maize field, crop rows at the image edges affect the accuracy of the whole algorithm, and its real-time performance is weak because all the candidate feature points and Hough transform were computed. When determining the crop rows, the core of the method proposed by Zhang et al. [22] is to accurately classify the candidate points and determine which crop rows they belong to. Although the accuracy and real-time performance of the method were generally good, the range of feature points in the image (bottom two-thirds of the image) and the number of rows need to be determined in advance, which requires active parameter adjustment for different field conditions. When more crop rows were shown in the image, computing all feature points affected the real-time performance. The method proposed in this paper could effectively solve the above problems. By determining the adaptive ROI and detecting crop rows in the ROI, the method had achieved a better result in terms of adaptability, stability and real-time performance by comparing the accuracy, standard deviation (S.D.) of accuracy and computation time with other methods. The average accuracy was 95.3% and the standard deviation was 0.023. Both the highest and lowest accuracy could meet the requirements of field navigation. The computation time of one frame of the image was 240.8 ms (standard deviation was 11 ms), which could meet the real-time requirements of agricultural machinery travelling in the field.  Note: θ Δ is the average value of error angle; HT is the method based on Hough transform; HS is the method based on horizontal strip; S.D. is the standard deviation of accuracy; FPS is the frames per second of video.
The differences in accuracy between different methods were analyzed with Wilcoxon rank-sum test [32] . The results showed that z-values of HT, HS, Jiang et al. [21] and Zhang et al. [22] were −12.218, −12.092, −12.016 and −11.180 respectively with p-Values<0.01 compared to the method in this paper. Significant differences are detected with accuracy (p-Value<0.01).
It suggests that the method in this paper has obvious superiority.

Conclusions
Based on machine vision, a crop rows detection method for field navigation was presented. Due to the variable and complex environment of maize fields, crop rows detection based on machine vision generally relies on complex algorithms, which leads to poor real-time performance and adaptability. To address the problem, a crop rows detection method based on adaptive multi-ROI was proposed. The image was segmented and divided into 10 horizontal strips. Subsequently, the initial ROI and midpoint were determined in bottom strip. Then, the initial ROI window was shifted upward along the height direction and applied to the next image strip. ROI and midpoint were renewed repeatedly until multi-ROI was determined. Crop rows were finally detected in multi-ROI. The detection accuracy of 1920×1080 pixels images was 95.3% (standard deviation was 0.023), and the average computation time was 240.8 ms (standard deviation was 11 ms). The method in this paper was compared with four existing methods. After performing Wilcoxon rank-sum test, it suggests that the method has great robustness and real-time performance.