Semantic understanding of complex scenarios in autonomou driving based on element information completion

Home > Archive>Volume 46, Issue 4, 2025 >295-305

Semantic understanding of complex scenarios in autonomou driving based on element information completion
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:1.College of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University， Chongqing 400074, China; 2.College of Computer Science, Sichuan University， Chengdu 610065, China
Clc Number:TH89TP751
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To address the challenges of incomplete geometric feature information of two-dimensional visual images of roadside facilities and traffic participants, as well as the lack of scene semantic information inaccurate perception and understanding of complex traffic scenarios for autonomous driving, a semantic understanding model for complex autonomous driving scenarios based on element information completion is proposed. Firstly, a dense connection network (DenseNet) is utilized to extract multi-scale 2D features from visual images. Then, the feature line-of-sight projection (FLoSP) module is used to inverse-map voxels to 3D space. A dimension decomposition residual (DDR) module is utilized to construct a 3D UNet, extracting 3D features of scene objects and enabling the transformation of singleframe 2D visual image features into 3D features. Additionally, a contextual residual prior (3D CRP) layer is introduced between the 3D UNet encoder and decoder. Atrous spatial pyramid pooling (ASPP) and Softmax layers are used to output scene semantic completion results, thereby enhancing the spatial semantic understanding capability of the model. Meanwhile, image caption generation technology is utilized to formulate a context-aware semantic embedding scene understanding language description model based on an improved VGG-16 encoder and a long short-term memory (LSTM) decoder. The improved VGG-16 encoder integrates and concatenates features of traffic scenes at different scales and inputs them into the LSTM decoder via a projection matrix, establishing a semantic representation between scene object images and predicate relations, and automatically generating natural language descriptions of object detection results and autonomous driving decision-making suggestions. Finally, the proposed complex scene semantic understanding algorithm is validated on the Semantic KITTI dataset and through real vehicle experiments. Compared with the JS3CNet algorithm, the results show that the proposed algorithm achieves a relative improvement of 11.27% in mean intersection over union (mIoU), realizes accurate perception and semantic understanding of complex scenarios in autonomous driving through semantic completion, and provides a reliable basis for autonomous driving decision-making and planning.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online: June 23,2025
Published:

Home

Introduction

Current Issue

Editorial Committee

Policy

Contact Us

中文版

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code