Predicting Future Driving Risk of Crash-Involved Drivers Based on a Systematic Machine Learning Framework

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The objective of this paper is to predict the future driving risk of crash-involved drivers in Kunshan, China. A systematic machine learning framework is proposed to deal with three critical technical issues: 1. defining driving risk; 2. developing risky driving factors; 3. developing a reliable and explicable machine learning model. High-risk (HR) and low-risk (LR) drivers were defined by five different scenarios. A number of features were extracted from seven-year crash/violation records. Drivers’ two-year prior crash/violation information was used to predict their driving risk in the subsequent two years. Using a one-year rolling time window, prediction models were developed for four consecutive time periods: 2013–2014, 2014–2015, 2015–2016, and 2016–2017. Four tree-based ensemble learning techniques were attempted, including random forest (RF), Adaboost with decision tree, gradient boosting decision tree (GBDT), and extreme gradient boosting decision tree (XGboost). A temporal transferability test and a follow-up study were applied to validate the trained models. The best scenario defining driving risk was multi-dimensional, encompassing crash recurrence, severity, and fault commitment. GBDT appeared to be the best model choice across all time periods, with an acceptable average precision (AP) of 0.68 on the most recent datasets (i.e., 2016–2017). Seven of nine top features were related to risky driving behaviors, which presented non-linear relationships with driving risk. Model transferability held within relatively short time intervals (1–2 years). Appropriate risk definition, complicated violation/crash features, and advanced machine learning techniques need to be considered for risk prediction task. The proposed machine learning approach is promising, so that safety interventions can be launched more effectively.

Related collections

Most cited references 32

Record: found
Abstract: not found
Article: not found

A study of the behavior of several methods for balancing machine learning training data

Gustavo E. A. P. A. Batista, Ronaldo C. Prati, Maria Carolina Monard (2004)

0 comments Cited 447 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data

Xiaoxiang Ma, Suren Chen, Feng Chen (2018)

0 comments Cited 97 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attacks of Disease or of Repeated Accidents

Major Greenwood, G. YULE (1920)

0 comments Cited 93 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Int J Environ Res Public Health

Journal ID (iso-abbrev): Int J Environ Res Public Health

Journal ID (publisher-id): ijerph

Title: International Journal of Environmental Research and Public Health

Publisher: MDPI

ISSN (Print): 1661-7827

ISSN (Electronic): 1660-4601

Publication date (Electronic): 25 January 2019

Publication date (Print): February 2019

Volume: 16

Issue: 3

Electronic Location Identifier: 334

Affiliations

[1 ]Jiangsu Key Laboratory of Urban ITS, Southeast University, Nanjing 210096, China; wkobec@ 123456hotmail.com

[2 ]Intelligent Transportation Research Center, Southeast University, Nanjing 210096, China

[3 ]Jiangsu Intelligent Transportation Systems Co., Ltd., Nanjing 210096, China; lin.liu@ 123456jiangsuits.com (L.L.); lwt@ 123456jiangsuits.com (W.L.)

Author notes

[* ]Correspondence: xuchengcheng@ 123456seu.edu.cn ; Tel.: +86-182-2114-8065

Article

Publisher ID: ijerph-16-00334

DOI: 10.3390/ijerph16030334

PMC ID: 6388263

PubMed ID: 30691063

SO-VID: 27acbcab-2dbf-410d-90d7-0458e5a99871

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Predicting Future Driving Risk of Crash-Involved Drivers Based on a Systematic Machine Learning Framework

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 32

A study of the behavior of several methods for balancing machine learning training data

Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data

An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attacks of Disease or of Repeated Accidents

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 68

Cited by 7

Most referenced authors 230