ADVersa: Abductive Driving Accident Video Understanding
ORCID ID
Hongkai Yu, https://orcid.org/0000-0001-5383-8913
Document Type
Article
Publication Date
2-11-2026
Publication Title
IEEE Transactions on Patterns Analysis and Machine Intelligence
Abstract
Understanding traffic accident scenes is a long-standing research for vision-based safe driving. It seeks to answer why accidents occur, how near-crash scenes develop, and what the key elements of an accident are. This research is challenging due to the scarcity and fragmentation of accident data, as well as the complex accident environments. To study this, we present a framework of Abductive Driving accident Video understanding (ADVersa), which infers a plausible visual and textual explanation for the absent near-crash scenes. ADVersa underscores three groups of tasks: 1) visual past recovery of near-crash scenes, 2) visual prediction of near-crash scenes, and 3) accident cause involved video synthesis. To support the study, we first contribute MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild driving accident videos with temporally aligned text descriptions, 2.23 million well-annotated object boxes, and 58,650 pairs of video-based accident cause texts. We then propose an Abductive CLIP model and a Contrastive Graph Video Pre-training (CGVP) model, which exploit relation-aware cross-modal semantic learning to drive spatially abductive and temporally abductive accident video diffusion. Extensive experiments verify the superiority of ADVersa to the state-of-the-art approaches on different tasks, i.e., historical near-crash video frame recovering, crashing video frame prediction, textual accident cause and category reasoning, normal-to-accident video synthesis, and accident video editing. With these efforts, we hope this research can advance the progress on multimodal accident video understanding.
Repository Citation
Li, Lei-Lei; Fang, Jianwu; Xiao, Junbin; Yu, Hongkai; and Lv, Chen, "ADVersa: Abductive Driving Accident Video Understanding" (2026). Electrical and Computer Engineering Faculty Publications. 533.
https://engagedscholarship.csuohio.edu/enece_facpub/533
DOI
10.1109/TPAMI.2026.3663545