Bird′s eye view generation based on recurrent cross-view transformation and multi-state feature fusion
DOI:
CSTR:
Author:
Affiliation:

Clc Number:

TP391. 4 TH865

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To address semantic inconsistency in multi-state associated feature extraction and balancing model performance with complexity in most multiple perspective view-based bird′s eye view (BEV) generation method, a light-weight Transformer-based BEV generation model is proposed. The method utilizes an end-to-end one-stage training strategy to establish a mutual association between dynamic vehicle and static road information in traffic scenes, effectively filtering out noise in the generated BEV. A Transformer-based recurrent cross-view transformation module for multi-scale features is introduced to perform image encoding and representation learning. This module improves the robustness of the extracted BEV features by capturing the location-dependent relationships in the perspective view (PV) feature sequence. Additionally, a multi-state BEV feature fusion module is designed to address semantic inconsistencies, extracting correlated information between dynamic vehicles and static roads, thus enhancing the performance of the generated BEVs. Experiments on the NuScenes dataset show that this method achieves advanced BEV generation performance with low model complexity, achieving 43. 2% and 82. 0% semantic segmentation accuracy for dynamic vehicles and static roads, respectively.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: January 03,2025
  • Published:
Article QR Code