0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      YOLOv8-MPEB small target detection algorithm based on UAV images

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Target detection in Unmanned Aerial Vehicle (UAV) aerial images has gained significance within UAV application scenarios. However, UAV aerial images present challenges, including large-scale changes, small target sizes, complex scenes, and variable external factors, resulting in missed or false detections. This study proposes an algorithm for small target detection in UAV images based on an enhanced YOLOv8 model termed YOLOv8-MPEB. Firstly, the Cross Stage Partial Darknet53 (CSPDarknet53) backbone network is substituted with the lightweight MobileNetV3 backbone network, consequently reducing model parameters and computational complexity, while also enhancing inference speed. Secondly, a dedicated small target detection layer is intricately designed to optimize feature extraction for multi-scale targets. Thirdly, the integration of the Efficient Multi-Scale Attention (EMA) mechanism within the Convolution to Feature (C2f) module aims to enhance the extraction of vital features and suppress superfluous ones. Lastly, the utilization of a bidirectional feature pyramid network (BiFPN) in the Neck segment serves to ameliorate detection errors stemming from scale variations and complex scenes, thereby augmenting model generalization. The study provides a thorough examination by conducting ablation experiments and comparing the results with alternative algorithms to substantiate the enhanced effectiveness of the proposed algorithm, with a particular focus on detection performance. The experimental outcomes illustrate that with a parameter count of 7.39 M and a model size of 14.5 MB, the algorithm attains a mean Average Precision (mAP) of 91.9 % on the custom-made helmet and reflective clothing dataset. In comparison to standard YOLOv8 models, this algorithm elevates average accuracy by 2.2 percentage points, reduces model parameters by 34 %, and diminishes model size by 32 %. It outperforms other prevalent detection algorithms in terms of accuracy and speed.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

          State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Squeeze-and-Excitation Networks

            The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at minimal additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative improvement of ∼25%. Models and code are available at https://github.com/hujie-frank/SENet.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Deep Residual Learning for Image Recognition

              Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. Tech report
                Bookmark

                Author and article information

                Contributors
                Journal
                Heliyon
                Heliyon
                Heliyon
                Elsevier
                2405-8440
                15 April 2024
                30 April 2024
                15 April 2024
                : 10
                : 8
                : e29501
                Affiliations
                [1]School of Civil Engineering and Transportation, Northeast Forestry University, Harbin 150040, China
                Author notes
                [* ]Corresponding author. yongchengji@ 123456126.com
                Article
                S2405-8440(24)05532-4 e29501
                10.1016/j.heliyon.2024.e29501
                11046113
                3d81c677-16d7-4269-81b7-1dedd3f0d92b
                © 2024 The Author(s)

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 25 January 2024
                : 8 April 2024
                : 9 April 2024
                Categories
                Research Article

                yolov8,mobilenetv3,attention mechanism,bifpn,small target detection

                Comments

                Comment on this article