Abstract:To provide better subjective auditory perception of speech enhancement for different listeners in different environments, a controllable speech enhancement model based on the perceptual conditional network is proposed. First, a quantile loss function is designed to balance the overestimation and underestimation of speech, which is used to guide the training of network. In this way, the output characteristics of model are controlled by adjusting the level of noise residual and speech distortion in the output of the network. Then, to make a single speech enhancement network has variable output characteristics, the conditional network is introduced. The conditional information is generated by the quantile value related to auditory perception in the quantile loss function to modulate the noisy speech features, and a controllable speech enhancement model is established. The experimental results show that, the designed quantile loss function can effectively adjust the level of residual noise and speech distortion in the enhanced speech, and the proposed controllable speech enhancement model based on the perceptual conditional network can provide variable characteristics of enhanced speech that can be actively controlled by the listener. The listener can get a better speech enhancement experience.