Abstract:In dynamic settings such as human-machine hybrid intelligent warehouses, Automated Guided Vehicles (AGVs) often face challenges in accurately detecting randomly appearing obstacles like pedestrians and forklifts, which can jeopardize both operational efficiency and safety. This study introduces a lightweight target detection method based on the fusion of LiDAR and image data, termed L-BEVFusion. Firstly, a lightweight feature extraction network is designed to derive 2D image information for constructing bird′s eye view (BEV) features. To reduce localization errors caused by relying on single-scale semantic information, multi-scale semantic features are incorporated. Secondly, an explicit supervision strategy utilizing depth ground truth is applied to project image features into 3D space. Predictive features from both image and point cloud data are then extracted. A BEV feature fusion network concatenates these image and point cloud BEV features along the channel dimension, enabling bounding box regression and classification for dynamic obstacle detection in human-machine collaborative warehouses. The proposed algorithm is evaluated on both the KITTI dataset and real warehouse-collected data. Experimental results show that, compared with common point cloud-image fusion methods, L-BEVFusion improves detection accuracy for workers and forklifts by 3.46% and 2.22%, respectively, on the warehouse dataset, with an overall average accuracy increase of 2.97%. It also demonstrates superior inference speed and detection size accuracy, achieving an average normal distance error of 4.02 mm and a tangential absolute error of 1.75 mm. These improvements enhance the real-time detection performance and reliability of AGVs, ensuring efficient and safe logistics operations in intelligent warehouses and highlighting strong practical value.