Multi-Domain Processing via Hybrid Denoising (MDPhD) Networks for Speech Enhancement

Jang-Hyun Kim* (Seoul National University, Works done at Clova AI Research), Jaejun Yoo* (Clova AI Research), Sanghyuk Chun (Clova AI Research), Adrian Kim (Clova AI Research), Jung-Woo Ha (Clova AI Research)

[paper (arXiv)]

Evaluation Results

CSIG CBAK COVL PESQ SSNR
Wiener 1 3.23 2.68 2.67 2.22 5.07
SEGAN 2 3.48 2.94 2.80 2.16 7.73
Wavenet 3 3.62 3.23 2.98 - -
MMSE-GAN 4 3.80 3.12 3.14 2.53 -
MDPhD (ours) 3.85 3.39 3.27 2.70 10.22

Audio Samples

Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
Noisy input
SEGAN
Wavenet
Ours

Citation

@article{kim2018mdphd,
    title={Multi-Domain Processing via Hybrid Denoising Networks for Speech Enhancement},
    author={Kim, Jang-Hyun and Yoo, Jaejun and Chun, Sanghyuk and Kim, Adrian and Ha, Jung-Woo},
    journal={arXiv preprint arXiv:1812.08914},
    year={2018}
}

References

  1. Pascal Scalart and Jozue Vieira Filho. Speech enhancement based on a priori signal to noise estimation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1996
  2. Santiago Pascual, Antonio Bonafonte, and Joan Serra. SEGAN: Speech enhancement generative adversarial network. In Interspeech, 2017.
  3. Dario Rethage, Jordi Pons, and Xavier Serra. A Wavenet for speech denoising. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
  4. Meet H Soni, Neil Shah, and Hemant A Patil. Time-frequency masking-based speech enhancement using generative adversarial network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.