YOLOv8-MPEB small target detection algorithm based on UAV images

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Target detection in Unmanned Aerial Vehicle (UAV) aerial images has gained significance within UAV application scenarios. However, UAV aerial images present challenges, including large-scale changes, small target sizes, complex scenes, and variable external factors, resulting in missed or false detections. This study proposes an algorithm for small target detection in UAV images based on an enhanced YOLOv8 model termed YOLOv8-MPEB. Firstly, the Cross Stage Partial Darknet53 (CSPDarknet53) backbone network is substituted with the lightweight MobileNetV3 backbone network, consequently reducing model parameters and computational complexity, while also enhancing inference speed. Secondly, a dedicated small target detection layer is intricately designed to optimize feature extraction for multi-scale targets. Thirdly, the integration of the Efficient Multi-Scale Attention (EMA) mechanism within the Convolution to Feature (C2f) module aims to enhance the extraction of vital features and suppress superfluous ones. Lastly, the utilization of a bidirectional feature pyramid network (BiFPN) in the Neck segment serves to ameliorate detection errors stemming from scale variations and complex scenes, thereby augmenting model generalization. The study provides a thorough examination by conducting ablation experiments and comparing the results with alternative algorithms to substantiate the enhanced effectiveness of the proposed algorithm, with a particular focus on detection performance. The experimental outcomes illustrate that with a parameter count of 7.39 M and a model size of 14.5 MB, the algorithm attains a mean Average Precision (mAP) of 91.9 % on the custom-made helmet and reflective clothing dataset. In comparison to standard YOLOv8 models, this algorithm elevates average accuracy by 2.2 percentage points, reduces model parameters by 34 %, and diminishes model size by 32 %. It outperforms other prevalent detection algorithms in terms of accuracy and speed.

Related collections

Most cited references 28

Record: found
Abstract: found
Article: not found

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Shaoqing Ren, Kaiming He, Ross Girshick … (2017)

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

0 comments Cited 2231 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Squeeze-and-Excitation Networks

Jie Hu, Li Shen, Samuel Albanie … (2019)

The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at minimal additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative improvement of ∼25%. Models and code are available at https://github.com/hujie-frank/SENet.

0 comments Cited 531 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren … (2015)

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. Tech report

0 comments Cited 244 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Yongcheng Ji

Journal

Journal ID (nlm-ta): Heliyon

Journal ID (iso-abbrev): Heliyon

Title: Heliyon

Publisher: Elsevier

ISSN (Electronic): 2405-8440

Publication date PMC-release: 15 April 2024

Publication date Collection: 30 April 2024

Publication date (Electronic): 15 April 2024

Volume: 10

Issue: 8

Electronic Location Identifier: e29501

Affiliations

[1]School of Civil Engineering and Transportation, Northeast Forestry University, Harbin 150040, China

Author notes

[* ]Corresponding author. yongchengji@ 123456126.com

Article

Publisher Item ID: S2405-8440(24)05532-4 Publisher ID: e29501

DOI: 10.1016/j.heliyon.2024.e29501

PMC ID: 11046113

SO-VID: 3d81c677-16d7-4269-81b7-1dedd3f0d92b

License:

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

History

Date received : 25 January 2024

Date revision received : 8 April 2024

Date accepted : 9 April 2024

YOLOv8-MPEB small target detection algorithm based on UAV images

Read this article at

Abstract

Related collections

Recursive Rule based Visual Categorization

Most cited references 28

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Squeeze-and-Excitation Networks

Deep Residual Learning for Image Recognition

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 142

Most referenced authors 288