Improved YOLOv8 network using multi-scale feature fusion for detecting small tea shoots in complex environments
DOI:
https://doi.org/10.25165/ijabe.v18i5.9475Keywords:
tea shoot segmentation, multi-scale fusion, attention mechanism, reparameterization technique, YOLOv8-segAbstract
Tea shoot segmentation is crucial for the automation of high-quality tea plucking. However, accurate segmentation of tea shoots in unstructured and complex environments presents significant challenges due to the small size of the targets and the similarity in color between the shoots and their background. To address these challenges and achieve accurate recognition of tea shoots in complex settings, an advanced tea shoot segmentation network model is proposed based on You Only Look Once version 8 segmentation (YOLOv8-seg) network model. Firstly, to enhance the model’s segmentation capability for small targets, this study designed a feature fusion network that incorporates shallow, large-scale features extracted by the backbone network. Subsequently, the features extracted at different scales by the backbone network are fused to obtain both global and local features, thereby enhancing the overall information representation capability of the features. Furthermore, the Efficient Channel Attention mechanism was integrated into the feature fusion process and combined with a reparameterization technique to refine and improve the efficiency of the fusion process. Finally, Wise-IoU with a dynamic non-monotonic aggregation mechanism was employed to assign varying gradient gains to anchor boxes of differing qualities. Experimental results demonstrate that the improved network model increases the AP50 of box and mask by 4.33% and 4.55%, respectively, while maintaining a smaller parameter count and reduced computational demand. Compared to other classical segmentation algorithms models, the proposed model excels in tea shoot segmentation. Overall, the advancements proposed in this study effectively segment tea shoots in complex environments, offering significant theoretical and practical contributions to the automated plucking of high-quality tea. Keywords: tea shoot segmentation, multi-scale fusion, attention mechanism, reparameterization technique, YOLOv8-seg DOI: 10.25165/j.ijabe.20251805.9475 Citation: Li Y T, Tan L H, Zhong Z H, He L Y, Chen J N, Wu C Y, et al. Improved YOLOv8 network using multi-scale feature fusion for detecting small tea shoots in complex environments. Int J Agric & Biol Eng, 2025; 18(5): 223–233.References
Yu T J, Chen J N, Chen Z W, Li Y T, Tong J H, Du X Q. DMT: A model detecting multispecies of tea buds in multi-seasons. Int J Agric & Biol Eng, 2024; 17(1): 199–208.
Yang J W, Li X, Wang X, Fu L Y, Li S W. Vision-Based Localization Method for Picking Points in Tea-Harvesting Robots. Sensors, 2024; 24(21): 6777.
Zheng H, Fu T, Xue X L, Ye Y X, Yu G H. Research status and prospect of tea mechanized picking technology. Journal of Chinese Agricultural Mechanization, 2023; 44(9): 28–35. (in Chinese)
Zhao L L, Deng H B, Zhou Y C, Miao T, Zhao K, Yang J, et al. Instance segmentation model of maize seedling images based on automatic generated labels. Transactions of the Chinese Society of Agricultural Engineering, 2023; 39(11): 201–211. (in Chinese)
Zhang L, Zou L, Wu C Y, Jia J M, Chen J N. Method of famous tea sprout identification and segmentation based on improved watershed algorithm. Computers and Electronics in Agriculture, 2021; 184: 106108.
Fan P, Lang G D, Yan B, Lei X Y, Guo P J, Liu Z J, et al. A method of segmenting apples based on gray-centered rgb color space. Remote Sensing, 2021; 13(6): 1211.
Akbar J U M, Kamarulzaman S F, Muzahid A J F, Rahman M A, Uddin M. A comprehensive review on deep learning assisted computer vision techniques for smart greenhouse agriculture. IEEE ACCESS, 2024; 12: 4485–4522.
Kang H W, Chen C. Fruit detection, segmentation and 3D visualisation of environments in apple orchards. Computers and Electronics in Agriculture, 2020; 171: 105302.
Liao J C, Babiker I, Xie W F, Li W, Cao L B. Dandelion segmentation with background transfer learning and RGB-attention module. Computers and Electronics in Agriculture, 2022; 202: 107355.
Xu W K, Zhao L G, Li J, Shang S Q, Ding X P, Wang T W. Detection and classification of tea buds based on deep learning. Computers and Electronics in Agriculture, 2022; 192: 106547.
Redmon J, Farhadi A. Yolov3: An incremental improvement. arXiv: 1804.02767. 2018; In press. doi: 10.48550/arXiv.1804.02767.
Gui Z Y, Chen J N, Li Y, Chen Z W, Wu C Y, Dong C W. A lightweight tea bud detection model based on Yolov5. Computers and Electronics in Agriculture, 2023; 205: 107636.
Han K, Wang Y H, Tian Q, Guo J Y, Xu C J, Xu C. GhostNet: More features from cheap operations. arXiv: 1911.11907, 2019; In press. doi: 10.48550/arXiv.1911.11907.
Li Y T, He L Y, Jia J M, Lv J, Chen J N, Qiao X, et al. In-field tea shoot detection and 3D localization using an RGB-D camera. Computers and Electronics in Agriculture, 2021; 185: 106149.
Wu H Y, Wang Y S, Zhao P F, Qian M B. Small-target weed-detection model based on YOLO-V4 with improved backbone and neck structures. Precision Agriculture, 2023; 24(6): 2149–2170.
Xu H, Zhong S, Zhang T X, Zou X. Multiscale multilevel residual feature fusion for real-time infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing, 2023; 61: 1–16.
Liu Q H, Zhang Y, Yang G P. Small unopened cotton boll counting by detection with MRF-YOLO in the wild. Computers and Electronics in Agriculture, 2023; 204: 107576.
Lin T Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. arXiv: 1612.03144, 2016; In press. doi: 10.48550/arXiv.1612.03144.
Liu S, Qi L, Qin H F, Shi J P, Jia J Y. Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE, 2018; pp.8759–8768. doi: 10.48550/arXiv.1803.01534.
Tan M X, Pang R M, Le Q V. EfficientDet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE, 2020; pp.10778–10787. doi: 10.1109/CVPR42600.2020.01079.
Zhang WQ, Huang Z L, Luo G Z, Chen T, Wang X G, Liu W Y, et al. TopFormer: Token pyramid transformer for mobile semantic segmentation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA: IEEE, 2022; 12083–12093. doi: 10.1109/CVPR52688.2022.01177.
Wang Y W, Ren Y Q, Kang S Y, Yin C B, Shi Y, Men H. Identification of tea quality at different picking periods: A hyperspectral system coupled with a multibranch kernel attention network. Food Chemistry, 2024; 433: 137307.
Qian H M, Wang H L, Feng S, Yan S Y. FESSD: SSD target detection based on feature fusion and feature enhancement. J Real-Time Image Process, 2023; 20: 2.
Song M X, Liu C, Chen L Q, Liu L C, Ning J M, Yu C Y. Recognition of tea buds based on an improved YOLOv7 model. Int J Agric & Biol Eng, 2024; 17(6): 238–244.
Wang C-Y, Liao H-Y M, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H. CSPNet: A new backbone that can enhance learning capability of CNN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA: IEEE, 2020; pp.1571–1580. doi: 10.48550/arXiv.1911.11929.
Zheng Z H, Wang P, Ren D W, Liu W, Ye R G, Hu Q H, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transactions on Cybernetics, 2022; 52(8): 8574–8586.
Din X H, Zhang X Y, Ma N N, Han J G, Ding G G, Sun J. Repvgg: Making vgg-style convnets great again. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA: IEEE, 2021; pp.13728–13737. doi: 10.48550/arXiv.2101.03697.
Wang Q L, Wu B G, Zhu F P, Li P H, Zuo W M, Hu Q H. ECA-Net: Efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA: IEEE, 2020; pp.11531–11539. doi: 10.1109/CVPR42600.2020.01155.
Tong Z J, Chen Y H, Xu Z W, Yu R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv: 2301.10051, 2023; In press. doi: 10.48550/arXiv.2301.10051.
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. arXiv: 1610.02391, 2016; In press. doi: 10.48550/arXiv.1610.02391.
Downloads
Published
How to Cite
Issue
Section
License
IJABE is an international peer reviewed open access journal, adopting Creative Commons Copyright Notices as follows.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).