YOLOv8np-RCW: A multi-task deep learning model for comprehensive visual information in tomato harvesting robot

Authors

  • Xinyi Ai 1. College of Engineering, China Agricultural University, Beijing 100083, China
  • Tianxue Zhang 2. School of Mechanical Engineering and Automation, Beihang University, Beijing 100191, China 3. Institute of Medical Robotics, Shanghai Jiaotong University, Shanghai 200240, China
  • Ting Yuan 1. College of Engineering, China Agricultural University, Beijing 100083, China
  • Xiajun Zheng 1. College of Engineering, China Agricultural University, Beijing 100083, China
  • Ziming Xiong 1. College of Engineering, China Agricultural University, Beijing 100083, China
  • Jiace Yuan 1. College of Engineering, China Agricultural University, Beijing 100083, China

DOI:

https://doi.org/10.25165/ijabe.v%25vi%25i.9719

Keywords:

tomato bunch detection, maturity detection, keypoint detection, harvesting robots

Abstract

In greenhouse environments, using automated machines for tomato harvesting to reduce labor consumption is a future development trend. Accurate and effective visual recognition is essential to accomplish harvesting tasks. However, most current studies use various models to gain harvesting information in multiple steps, resulting in heavy calculation costs, poor real-time availability, and weak recognition precision. In this study, an improved YOLOv8np-RCW end-to-end model based on YOLOv8n pose is proposed to simultaneously detect tomato bunches, maturity, and keypoints using a decoupled-head structure. The model integrates a ResNet-enhanced RepVGG architecture for a balance of accuracy and speed, employs the CARAFE upsampling algorithm for a larger receptive field with lightweight design, and optimizes the loss function with WIoU loss to enhance bounding box prediction, maturity detection, and keypoint extraction. Experimental results indicate that mAP50 of YOLOv8np-RCW model for bounding box and keypoints is 87.3% and 86.8% respectively, which is 6.2% and 5.5% higher than YOLOv8n pose model. Completing the tasks of bunch detection, maturity assessment, and keypoint localization requires only 9.8 ms. Euclidean distance error is less than 20 pixels in detecting keypoints. Based on this model, a method is proposed to quickly determine the orientation of tomato bunches using geometric cross-product and cross-multiplication calculations from keypoint 2D information, providing guidance for the motion planning of the end-effector. In field experiments, the robot achieved a harvesting success rate of 68%, with an average time of 10.8366 seconds per tomato bunch. Keywords: tomato bunch detection, maturity detection, keypoint detection, harvesting robots DOI: 10.25165/j.ijabe.20251805.9719 Citation: Ai X Y, Zhang T X, Yuan T, Zheng X J, Xiong Z M, Yuan J C. YOLOv8np-RCW: A multi-task deep learning model for comprehensive visual information in tomato harvesting robot. Int J Agric & Biol Eng, 2025; 18(5): 246–258.

References

Liu J Z, Peng Y, Faheem M. Experimental and theoretical analysis of fruit plucking patterns for robotic tomato harvesting. Comput Electron Agric, 2020; 173: 105330.

Zhang F, Gao J, Zhou H, Zhang J X, Zou K L, Yuan T. Three-dimensional pose detection method based on keypoints detection network for tomato bunch. Comput Electron Agric, 2022; 195: 106824.

Maureira F, Rajagopalan K, Stöckle C O. Evaluating tomato production in open-field and high-tech greenhouse systems. J Clean Prod, 2022; 337: 130459.

Zhou H Y, Wang X, Au W, Kang H W, Chen C. Intelligent robots for fruit harvesting: recent developments and future challenges. Precis Agric, 2022; 23: 1856–1907.

Zheng X J, Rong J C, Zhang Z Q, Yang Y, Li W, Yuan T. Fruit growing direction recognition and nesting grasping strategies for tomato harvesting robots. J Field Robot, 2024; 41: 300–313.

Wu J Q, Fan S Z, Gong L, Yuan J, Zhou Q, Liu C L. Research status and development direction of design and control technology of fruit and vegetable picking robot system. Smart Agric, 2020; 2(4): 17–40.

Gao J, Zhang J X, Zhang F, Gao J F. LACTA: A lightweight and accurate algorithm for cherry tomato detection in unstructured environments. Expert Syst Appl, 2024; 238: 122073.

Rapado-Rincón D, van Henten E J, Kootstra G. Development and evaluation of automated localisation and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking. Biosyst Eng, 2023; 231: 78–91.

Xiong Y, Ge Y, From P J. An obstacle separation method for robotic picking of fruits in clusters. Comput Electron Agric, 2020; 175: 105397.

Kim J, Pyo H, Jang I, Kang J, Ju B, Ko K. Tomato harvesting robotic system based on Deep-ToMaToS: Deep learning network using transformation loss for 6D pose estimation of maturity classified tomatoes with side-stem. Comput Electron Agric, 2022; 201: 107300.

Li H P, Li C Y, Li G B, Chen L X. A real-time table grape detection method based on improved YOLOv4-tiny network in complex background. Biosyst Eng, 2021; 212: 347–359.

Li T H, Sun M, He Q H, Zhang G S, Shi G Y, Ding X M, et al. Tomato recognition and location algorithm based on improved YOLOv5. Comput Electron Agric, 2023; 208: 107759.

Zhang J X, Xie J Y, Zhang F, Gao J, Yang C, Song C Y, et al. Greenhouse tomato detection and pose classification algorithm based on improved YOLOv5. Comput Electron Agric, 2024; 216: 108519.

Yoshida T, Fukao T, Hasegawa T. Cutting point detection using a robot with point clouds for tomato harvesting. J Robot Mechatron, 2020; 32(2): 437–444.

Qi J T, Liu X N, Liu K, Xu F R, Guo H, Tian X L, et al. An improved YOLOv5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput Electron Agric, 2022; 194: 106780.

Rong Q J, Hu C H, Hu X D, Xu M X. Picking point recognition for ripe tomatoes using semantic segmentation and morphological processing. Comput Electron Agric, 2023; 210: 107923.

Fu L H, Wu F Y, Zou X J, Jiang Y L, Lin J Q, Yang Z, et al. Fast detection of banana bunches and stalks in the natural environment based on deep learning. Comput Electron Agric, 2022; 194: 106800.

Zhu Y J, Li S S, Du W S, Du Y P, Liu P, Li X. Identification of table grapes in the natural environment based on an improved YOLOv5 and localization of picking points. Precis Agric, 2023; 24: 1333–1354.

Chen J Q, Ma A Q, Huang L X, Li H W, Zhang H Y, Huang Y, et al. Efficient and lightweight grape and picking point synchronous detection model based on key point detection. Comput Electron Agric, 2024; 217: 108612.

Ukwuoma C C, Zhiguang Q, Bin Heyat M B, Ali L, Almaspoor Z, Monday H N. Recent advancements in fruit detection and classification using deep learning techniques. Math Probl Eng, 2022; 2022(1): 9210947.

Koirala A, Walsh K B, Wang Z, McCarthy C. Deep learning – Method overview and review of use for fruit detection and yield estimation. Comput Electron Agric, 2019; 162: 219–234.

Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. 2016; Available: https://doi.org/10.48550/arXiv.1612.08242. Accessed on [2024-11-17].

Bochkovskiy A, Wang C-Y, Liao H-Y M. YOLOv4: Optimal speed and accuracy of object detection. 2020; Available: https://doi.org/10.48550/arXiv.2004.10934. Accessed on [2024-11-17].

Wang C-Y, Bochkovskiy A, Liao H-Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2022; Available: https://doi.org/10.48550/arXiv.2207.02696. Accessed on [2024-11-17].

Ding X H, Zhang X Y, Ma N N, Han J G, Ding G G, Sun J. RepVGG: making VGG-style ConvNets great again. 2021. Available: https://doi.org/10.48550/arXiv.2101.03697. Accessed on [2024-09-23].

He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, 2016; pp.770–778. doi: 10.1109/CVPR.2016.90.

Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015; Available: https://doi.org/10.48550/arXiv.1502.03167. Accessed on [2025-01-23].

Wang J Q, Chen K, Xu R, Liu Z W, Loy C C, Lin D H. CARAFE: Content-aware reassembly of FEatures. 2019; Available: https://doi.org/10.48550/arXiv.1905.02188. Accessed on [2024-11-03].

Tong Z J, Chen Y H, Xu Z W, Yu R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. 2023; Available: https://doi.org/10.48550/arXiv.2301.10051. Accessed on [2024-12-12].

Downloads

Published

2025-10-27

How to Cite

Ai, X., Zhang, T., Yuan, T., Zheng, X., Xiong, Z., & Yuan, J. (2025). YOLOv8np-RCW: A multi-task deep learning model for comprehensive visual information in tomato harvesting robot. International Journal of Agricultural and Biological Engineering, 18(5), 246–258. https://doi.org/10.25165/ijabe.v%vi%i.9719

Issue

Section

Information Technology, Sensors and Control Systems