双分支复频谱下多特征聚合的轻量化语音增强方法
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TH701 TN912. 35

基金项目:

重庆市自然科学基金(cstc2021jcyjmsxmX0836)项目资助


A lightweight speech enhancenment method based on dual branch complex spectrum with multiple feature aggregation
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对目前多种改进的卷积循环网络(CRN)在单掩蔽或单映射的编解码结构下提取特征单一、捕获全局特征不强、参数 量较大等问题,提出一种多特征聚合卷积模块与高效 Transformer 融合注意力机制结合的复频谱联合掩蔽和映射的单通道语音 增强高效网络。 在编解码层设计一种双分支门控协作单元(DGCU),提取复频谱多层次特征后交互、聚合以弥补特征提取单一 问题;中间层设计一种通道时频注意力融合模块,聚焦语音的时频、空间局部细节特征。 最后在 THCHS30 数据集上进行消融和 对比实验,实验结果表明,该网络以最低参数量、较低计算量实现了轻量化,在匹配和不匹配噪声下 PESQ 分别提升了 10. 5% ~ 50. 6% 、16. 3% ~ 94. 5% ,客观、主观指标都优于其他对比的网络模型,表现出较高的降噪性能和网络泛化能力。

    Abstract:

    To address the issues with current variations of Convolution Recurrent Networks (CRN), which often extract limited features, capture global characteristics poorly, and have large parameter sizes under single masking or mapping encoder-decoder structures, this paper proposes an efficient single-channel speech enhancement network. This network combines a multi-feature aggregation convolution module, leveraging complex spectrum joint masking and mapping, with an efficient Transformer-based attention mechanism. In the encoder-decoder layer, a Dual-branch Gated Cooperative Unit (DGCU) is designed to interact and aggregate multi-level complex spectral features, addressing the problem of singular feature extraction. The intermediate layer incorporates a Channel-Time-Frequency Attention Fusion Module, focusing on spatial and time-frequency local detail features of speech. Ablation and comparative experiments on the THCHS30 dataset demonstrate that this network achieves lightweight efficiency with the lowest parameter count and relatively low computational cost. It improves PESQ by 10. 5% ~ 50. 6% and 16. 3% ~ 94. 5% under matched and mismatched noise conditions, respectively. Both objective and subjective metrics outperform other comparative network models, exhibiting superior noise reduction performance and network generalization capability.

    参考文献
    相似文献
    引证文献
引用本文

张天骐,沈夕文,唐 娟,谭 霜.双分支复频谱下多特征聚合的轻量化语音增强方法[J].仪器仪表学报,2024,45(7):279-291

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-10-24
  • 出版日期:
文章二维码