Abstract:Addressing the challenges of a long-tailed distribution of data and low detection accuracy caused by the difficulty in collecting defect samples for printed circuit boards (PCBs) in real-world environments, as well as the high computational complexity when using Vision Transformer (ViT) for detection, we propose an end-to-end PCB defect detection algorithm that incorporates multi-scale ViT feature extraction and attention feature fusion. Firstly, a multi-scale feature extraction network is constructed by combining ViT and partial convolution. Hierarchical multi-head attention is employed to perform adaptive attention operations on different scales of feature maps, enabling the network to better capture local and global information, thereby enhancing its feature extraction capabilities. Partial convolution is utilized to reduce computational costs. Secondly, a non-parametric attention mechanism based on the energy domain suppression effectively fuses multi-scale features, enhancing the expressive power of the network′s fused feature maps. Finally, a classification function sensitive to class imbalance is introduced to improve the loss function of the network, enhancing its fitting ability to imbalanced data and improving generalization. The experimental results on three different types of publicly available PCB datasets indicate that the proposed detection algorithm shows improvement in the mean Average Precision ( mAP) for PCB surface defect datasets, with respective values of 99. 13% , 98. 67% , and 99. 82% . In the case of class-imbalanced PCB defect detection tasks, the mAP is improved by 11. 94% compared to the previous method, and the network achieves a detection speed of 25 FPS, providing a fast and effective approach for PCB defect detection.