基于要素信息补全的自动驾驶复杂场景语义理解
DOI:
CSTR:
作者:
作者单位:

1.重庆交通大学机电与车辆工程学院重庆400074; 2.四川大学计算机学院成都610065

作者简介:

通讯作者:

中图分类号:

TH89TP751

基金项目:

国家自然科学基金项目(52072054)、重庆市自然科学基金创新发展联合基金项目(CSTB2024NSCQ-LZX0105)、重庆交通大学自然科学类揭榜挂帅项目(XJ2023)资助


Semantic understanding of complex scenarios in autonomou driving based on element information completion
Author:
Affiliation:

1.College of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400074, China; 2.College of Computer Science, Sichuan University, Chengdu 610065, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对自动驾驶复杂交通场景精准感知与理解过程中路侧设施及交通参与者二维视觉图像几何特征信息不全、场景语义信息缺乏等问题,构建一种基于要素信息补全的自动驾驶复杂场景语义理解模型。首先,运用稠密连接网络(DenseNet)提取视觉图像多尺度二维特征,通过特征视线投影模块(FLoSP)将体素逆向映射至三维空间,采用维度分解残差(DDR)模块构建3D UNet,提取场景目标三维特征,实现单帧视觉图像二维特征向三维特征的转换,再在3D UNet编码器与解码器之间引入三维上下文先验层(3D CRP),并通过空洞空间金字塔池化(ASPP)与Softmax层输出场景语义补全结果,以增强语义补全模型的空间语义理解能力。同时,运用图像描述生成技术,构建基于改进VGG-16编码器和长短时记忆网络(LSTM)解码器的上下文语义嵌入场景理解语言描述模型,其中改进VGG-16编码器将不同尺度的交通场景特征进行融合与拼接,并通过投影矩阵输入到LSTM解码器,建立场景目标图像与谓词关系的语义表示,进而自动生成目标检测结果及自动驾驶决策规划建议自然语言描述。最后,运用Semantic KITTI数据集及实车实验,对所提出的复杂场景语义理解算法进行验证。结果表明,该算法相较于JS3C-Net算法平均交并比(mIoU)相对提升了11.27%,通过语义补全实现了自动驾驶复杂场景的准确感知与语义理解,为自动驾驶决策规划提供可靠依据。

    Abstract:

    To address the challenges of incomplete geometric feature information of two-dimensional visual images of roadside facilities and traffic participants, as well as the lack of scene semantic information inaccurate perception and understanding of complex traffic scenarios for autonomous driving, a semantic understanding model for complex autonomous driving scenarios based on element information completion is proposed. Firstly, a dense connection network (DenseNet) is utilized to extract multi-scale 2D features from visual images. Then, the feature line-of-sight projection (FLoSP) module is used to inverse-map voxels to 3D space. A dimension decomposition residual (DDR) module is utilized to construct a 3D UNet, extracting 3D features of scene objects and enabling the transformation of singleframe 2D visual image features into 3D features. Additionally, a contextual residual prior (3D CRP) layer is introduced between the 3D UNet encoder and decoder. Atrous spatial pyramid pooling (ASPP) and Softmax layers are used to output scene semantic completion results, thereby enhancing the spatial semantic understanding capability of the model. Meanwhile, image caption generation technology is utilized to formulate a context-aware semantic embedding scene understanding language description model based on an improved VGG-16 encoder and a long short-term memory (LSTM) decoder. The improved VGG-16 encoder integrates and concatenates features of traffic scenes at different scales and inputs them into the LSTM decoder via a projection matrix, establishing a semantic representation between scene object images and predicate relations, and automatically generating natural language descriptions of object detection results and autonomous driving decision-making suggestions. Finally, the proposed complex scene semantic understanding algorithm is validated on the Semantic KITTI dataset and through real vehicle experiments. Compared with the JS3CNet algorithm, the results show that the proposed algorithm achieves a relative improvement of 11.27% in mean intersection over union (mIoU), realizes accurate perception and semantic understanding of complex scenarios in autonomous driving through semantic completion, and provides a reliable basis for autonomous driving decision-making and planning.

    参考文献
    相似文献
    引证文献
引用本文

赵树恩,袁亮,赵东宇.基于要素信息补全的自动驾驶复杂场景语义理解[J].仪器仪表学报,2025,46(4):295-305

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-06-23
  • 出版日期:
文章二维码