Abstract:Surface electromyography (sEMG)based lower-limb motion intention recognition has been shown to hold broad application potential in the field of human-machine interface (HMI). However, owing to the inherent inter-subject variability of sEMG signals, significant domain shifts exist in feature distributions across subjects, which severely limits the generalization capability of sEMG recognition systems in cross-subject scenarios. To address this issue, a novel method named cross spatial attention-based dual-stream time-frequency convolutional neural network with gate-controlled feature decoupling (CSACNN-GFD) is proposed in this study. The proposed method adopts a dual-branch time-frequency input structure and employs a multi-scale convolution module integrated with spatial attention to capture the spatial correlation and time-frequency dynamic features of multi-channel sEMG signals, thereby enhancing the capability of motion intention information extraction. Furthermore, a gate-controlled feature decoupling module with a complementary mechanism is designed, together with a decoupling loss function to constrain both feature extraction and gate learning processes. This design enables adaptive feature partitioning in the deep representation space, realizing the disentanglement of motion-related features from subject-related features, and further performing pattern recognition using the cross-subject invariant motion features. In the experiments, sEMG data of five common continuous lower-limb movements were collected from ten subjects, and comparative experiments on motion pattern recognition were conducted against existing generalization strategies under the leave-one-subject-out (LOSO) cross-validation setting, with the results showing that the proposed CSACNN-GFD achieves an average accuracy of 84.29% on unseen subjects. Further validation on a public dataset with eight types of movements yields an average accuracy of 73.83%, improving the average performance by 4.32% and 6.55% respectively compared with the baseline models, and outperforming mainstream strategies including MIXUP, DANN, CORAL, and DIFEX. Meanwhile, the inference time of CSACNN-GFD is only 9.57 ms, demonstrating favorable real-time performance. The proposed method effectively enhances the generalization capability of cross-subject sEMG recognition systems, thus contributing to the universalization development of human-machine interaction technologies.