Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A 20% rise in car crashes in 2021 compared to 2020 has been observed as a result of increased distraction and drowsiness. Drowsy and distracted driving are the cause of 45% of all car crashes. As a means to decrease drowsy and distracted driving, detection methods using computer vision can be designed to be low-cost, accurate, and minimally invasive. This work investigated the use of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs. Two separate transformers were trained for drowsiness and distractedness. The drowsy video transformer model was trained on the National Tsing-Hua University Drowsy Driving Dataset (NTHU-DDD) with a Video Swin Transformer model for 10 epochs on two classes -- drowsy and non-drowsy simulated over 10.5 hours. The distracted video transformer was trained on the Driver Monitoring Dataset (DMD) with Video Swin Transformer for 50 epochs over 9 distraction-related classes. The accuracy of the drowsiness model reached 44% and a high loss value on the test set, indicating overfitting and poor model performance. Overfitting indicates limited training data and applied model architecture lacked quantifiable parameters to learn. The distracted model outperformed state-of-the-art models on DMD reaching 97.5%, indicating that with sufficient data and a strong architecture, transformers are suitable for unfit driving detection. Future research should use newer and stronger models such as TokenLearner to achieve higher accuracy and efficiency, merge existing datasets to expand to detecting drunk driving and road rage to create a comprehensive solution to prevent traffic crashes, and deploying a functioning prototype to revolutionize the automotive safety industry.

Related collections

Author and article information

Journal

Publication date Created: 22 July 2022

Article

ArXiV ID: 2207.12148

SO-VID: 99d35e1a-cf1d-40e2-a4e9-83990193b924

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Categories cs.CV cs.AI

ScienceOpen disciplines: Computer vision & Pattern recognition,Artificial intelligence

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Artificial intelligence

Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

Read this article at

Abstract

Related collections

Data-Driven Civil Engineering

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 105