基于感知条件网络的可控语音增强模型
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TN912. 3 TH701

基金项目:

山东省自然科学基金(ZR2022MF330,ZR2021MF017)、国家自然科学基金(61701286)项目资助


Controllable speech enhancement model based on perceptual conditional network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了给不同听者在不同场景下提供更好的语音增强主观听觉感受,提出了一种基于感知条件网络的可控语音增强模 型。 首先设计分位数损失函数来对语音的高估和低估进行权衡,并以此来指导网络的训练,通过调节网络输出中的语音损失和 噪声残留水平,来控制模型的输出特性。 然后为了让单个网络具有可变的输出特性,引入条件网络,利用分位数损失函数中与 听者感知相关的分位值产生条件信息来对含噪语音特征进行调制,建立了可控的语音增强模型。 实验结果表明,设计的分位数 损失函数能够有效调节增强语音中的语音损失和噪声残留水平;基于感知条件网络建立的可控语音增强模型,能够提供可由听 者主动控制的增强语音输出特性,使听者获得更好的语音增强体验。

    Abstract:

    To provide better subjective auditory perception of speech enhancement for different listeners in different environments, a controllable speech enhancement model based on the perceptual conditional network is proposed. First, a quantile loss function is designed to balance the overestimation and underestimation of speech, which is used to guide the training of network. In this way, the output characteristics of model are controlled by adjusting the level of noise residual and speech distortion in the output of the network. Then, to make a single speech enhancement network has variable output characteristics, the conditional network is introduced. The conditional information is generated by the quantile value related to auditory perception in the quantile loss function to modulate the noisy speech features, and a controllable speech enhancement model is established. The experimental results show that, the designed quantile loss function can effectively adjust the level of residual noise and speech distortion in the enhanced speech, and the proposed controllable speech enhancement model based on the perceptual conditional network can provide variable characteristics of enhanced speech that can be actively controlled by the listener. The listener can get a better speech enhancement experience.

    参考文献
    相似文献
    引证文献
引用本文

袁文浩,屈庆洋,梁春燕,夏 斌.基于感知条件网络的可控语音增强模型[J].仪器仪表学报,2023,44(5):53-60

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-08-17
  • 出版日期:
文章二维码