Abstract:To improve the safety and flexibility of the feeding process for individuals with limited mobility who are unable to feed themselves, this study proposes a multimodal human-robot interaction framework for assisted feeding and a whole-process intention recognition algorithm. First, based on user characteristics and the requirements for safety and flexibility, a multimodal interaction framework integrating vision, touch, force, position, and language fusion is proposed, and an assisted feeding system is developed. Second, a vision-driven whole-process intention recognition method is introduced to address the entire feeding process, including feeding intention, dish selection intention, dynamic feeding point estimation, delivery pose calculation, and chewing intention. Key facial feature points that effectively capture dynamic changes during feeding are selected, and an algorithm combining the aspect ratio of the mouth and the mandible is designed. The user′s dish selection intention is analyzed through gaze vector estimation, and dynamic feeding points are determined based on real-time facial pose tracking, enabling accurate recognition of dynamic intentions throughout the process. Furthermore, in the virtual mapping system for assisted feeding, a feedback mechanism is established by leveraging a large language model to clarify ambiguous intentions and adapt to temporary changes during the interaction, thereby enhancing safety. Finally, the proposed method is validated through simulations and comprehensive experiments. The results demonstrate that the multimodal interaction framework significantly improves the flexibility of the assisted feeding process, while the integration of the large language model provides effective feedback for ambiguous and changing intentions, ultimately enhancing the safety of the interaction. This approach offers a novel care solution for assisting feeding behaviors in the daily lives of individuals with limited mobility