基于深度强化学习的变步长LMS算法
DOI:
CSTR:
作者:
作者单位:

1.武汉大学电气与自动化学院武汉430072; 2.长江水利委员会水文局长江口水文水资源勘测局上海200136

作者简介:

通讯作者:

中图分类号:

TN911.7TH701

基金项目:

国家自然科学基金(42176186)项目资助


A variable step size LMS algorithm based on deep reinforcement learning
Author:
Affiliation:

1.School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China; 2.The Survey Bureau of Hydrology and Water Resources of Changjiang Estuary, Shanghai 200136, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对定步长LMS算法在收敛速度和稳态误差之间难以取得平衡的问题以及传统变步长算法对初始参数选择依赖程度高、工作量大且存在主观性的缺陷,提出了一种基于深度强化学习的变步长LMS算法。该算法对初始参数的依赖性小,规避了繁琐的调参流程。首先,构建了一个融合深度强化学习和自适应滤波的算法模型,该模型利用深度强化学习智能体控制步长因子的变化,代替了传统变步长算法中用于步长调整的非线性函数,从而规避了繁琐的实验调参流程,降低了算法使用的复杂性。其次,提出了基于误差的状态奖励和基于步长的动作奖励函数,引入动态奖励与负奖励机制,有效提升算法的收敛速度。此外,设计了基于欠完备编码器的网络结构,提高了强化学习策略的推理能力。通过实验验证,相较于其他较新的变步长算法,所提出的算法具有更快的收敛速度和更小的稳态误差,在不同初始参数下均能快速调整至合理的步长值,减少了实验调参的工作量。将训练完成的网络应用到系统辨识、信号去噪以及截流区龙口水域水位信号的滤波等实际领域中,均取得了良好的性能表现,证明了算法具有一定的泛化能力,并进一步证实了其有效性。

    Abstract:

    This article proposes a variable step size LMS algorithm based on deep reinforcement learning to address the problem of the difficult balance between convergence speed and steady-state error in the fixed step size LMS algorithm, as well as the high dependence on initial parameter selection, heavy workload, and subjective defects of traditional variable step size algorithms. This algorithm has a low dependence on initial parameters and avoids the cumbersome parameter tuning process. Firstly, an algorithm model integrating deep reinforcement learning and adaptive filtering is constructed, which utilizes deep reinforcement learning agents to control the change of step size factors, replacing the nonlinear function used for step size adjustment in traditional variable step size algorithms, thereby avoiding the cumbersome experimental parameter tuning process and reducing the complexity of algorithm use. Secondly, the error-based state reward and step size-based action reward functions are proposed. Dynamic rewards and negative reward mechanisms are introduced, which effectively improves the convergence speed of the algorithm. In addition, a network architecture based on incomplete encoders is designed to improve the inference ability of reinforcement learning strategies. Through experimental verification, compared with other newer variable step size algorithms, the algorithm proposed in this article can quickly adjust to a reasonable step size value under different initial parameters and reduce the workload of experimental parameter tuning, obtaining faster convergence speed and smaller steady-state error. The trained network has been applied to practical fields, such as system identification, signal denoising, and filtering of water level signals at the closure gap, and has achieved good performance, further confirming the generalization ability and effectiveness of the algorithm.

    参考文献
    相似文献
    引证文献
引用本文

徐君阳,张红梅,张坤.基于深度强化学习的变步长LMS算法[J].仪器仪表学报,2025,46(2):70-80

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-04-28
  • 出版日期:
文章二维码