An object detection algorithm combining self-attention and YOLOv4 in traffic scene

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Automobile intelligence is the trend for modern automobiles, of which environment perception is the key technology of intelligent automobile research. For autonomous vehicles, the detection of object information, such as vehicles and pedestrians in traffic scenes is crucial to improving driving safety. However, in the actual traffic scene, there are many special conditions such as object occlusion, small objects, and bad weather, which will affect the accuracy of object detection. In this research, the SwinT-YOLOv4 algorithm is proposed for detecting objects in traffic scenes, which is based on the YOLOv4 algorithm. Compared with a Convolutional neural network (CNN), the vision transformer is more powerful at extracting vision features of objects in the image. The CNN-based backbone in YOLOv4 is replaced by the Swin Transformer in the proposed algorithm. The feature-fusing neck and predicting head of YOLOv4 is remained. The proposed model was trained and evaluated in the COCO dataset. Experiments show that our method can significantly improve the accuracy of object detection under special conditions. Equipped with our method, the object detection precision for cars and person is improved by 1.75%, and the detection precision for car and person reach 89.04% and 94.16%, respectively.

Related collections

Most cited references 27

Record: found
Abstract: found
Article: not found

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Shaoqing Ren, Kaiming He, Ross Girshick … (2017)

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

0 comments Cited 2250 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Focal loss for dense object detection

Tsung-Yi Lin, Priyal Goyal, Ross Girshick … (2018)

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.

0 comments Cited 631 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

FCOS: Fully Convolutional One-Stage Object Detection

Zhi Tian, Chunhua Shen, Hao Chen … (2019)

0 comments Cited 220 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Kewei Lu: Role: Data curationRole: Formal analysisRole: InvestigationRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draft

Fengkui Zhao:

ORCID: https://orcid.org/0000-0003-1150-6640

Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: SupervisionRole: Validation

Xiaomei Xu:

ORCID: https://orcid.org/0000-0001-7147-0905

Role: Project administrationRole: SupervisionRole: ValidationRole: Writing – review & editing

Yong Zhang: Role: Funding acquisitionRole: Project administrationRole: ResourcesRole: Supervision

Mohamed Hammad: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS One

Journal ID (publisher-id): plos

Title: PLOS ONE

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 1932-6203

Publication date (Electronic): 18 May 2023

Publication date Collection: 2023

Volume: 18

Issue: 5

Electronic Location Identifier: e0285654

Affiliations

[001] College of Automobile and Traffic Engineering, Nanjing Forestry University, Nanjing, 210037, China

Menoufia University, EGYPT

Author notes

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: zfk@ 123456njfu.edu.cn

Author information

Fengkui Zhao https://orcid.org/0000-0003-1150-6640

Xiaomei Xu https://orcid.org/0000-0001-7147-0905

Article

Publisher ID: PONE-D-22-31357

DOI: 10.1371/journal.pone.0285654

PMC ID: 10194927

PubMed ID: 37200376

SO-VID: dc002df4-0faf-4e18-b706-7750b8094377

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 14 November 2022

Date accepted : 27 April 2023

Page count

Figures: 12, Tables: 4, Pages: 18

Funding

Funded by: Industrial Proactive and Key Technology Program of Jiangsu Province

Award ID: BE2022053-2

Award Recipient : Yong Zhang

Funded by: Modern Agriculture-Key and General Program of Jiangsu Province

Award ID: BE2021339

Award Recipient :

ORCID: https://orcid.org/0000-0003-1150-6640

Fengkui Zhao

Funded by: Philosophy and Social Science Program of the Higher Education Institutions of Jiangsu Province

Award ID: 2021SJA0151

Award Recipient :

ORCID: https://orcid.org/0000-0003-1150-6640

Fengkui Zhao

Funded by: Science and Technology Innovation Foundation for Young Scientists of Nanjing Forestry University

Award ID: CX2019018

Award Recipient :

ORCID: https://orcid.org/0000-0003-1150-6640

Fengkui Zhao

This research was funded by the Industrial Proactive and Key Technology Program of Jiangsu Province (grant number BE2022053-2), Modern Agriculture-Key and General Program of Jiangsu Province (grant number BE2021339), Philosophy and Social Science Program of the Higher Education Institutions of Jiangsu Province (grant number 2021SJA0151) and Science and Technology Innovation Foundation for Young Scientists of Nanjing Forestry University (grant number CX2019018). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Custom metadata

Data Availability The COCO2017 dataset was used in this study. The dataset can be downloaded from the Common Objects in Context website ( https://cocodataset.org/#download).

An object detection algorithm combining self-attention and YOLOv4 in traffic scene

Read this article at

Abstract

Related collections

PLOS Climate

Most cited references 27

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Focal loss for dense object detection

FCOS: Fully Convolutional One-Stage Object Detection

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 169

Cited by 1

Most referenced authors 253