Sound demos for "Speech Denoising in the Waveform Domain with Self-Attention"

Authors: Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

We present audio samples for the causal CleanUNet model proposed in Speech Denoising in the Waveform Domain with Self-Attention. We use CleanUNet with N=5 self attention blocks in the bottleneck layer and L1 plus high-band STFT losses. We compare CleanUNet to other SOTA models including the FAIR-denoiser and FullSubNet. The official PyTorch implementation can be found in this link

Speech Denoising on the DNS (2020) Dataset



Keyboard / Mechanical noise

Noisy CleanUNet (ours) FAIR-denoiser FullSubNet Clean (reference)


Dog barking

Noisy CleanUNet (ours) FAIR-denoiser FullSubNet Clean (reference)


Human talking

Noisy CleanUNet (ours) FAIR-denoiser FullSubNet Clean (reference)


Indoor noise

Noisy CleanUNet (ours) FAIR-denoiser FullSubNet Clean (reference)


Street noise

Noisy CleanUNet (ours) FAIR-denoiser FullSubNet Clean (reference)


Shrill noise

Noisy CleanUNet (ours) FAIR-denoiser FullSubNet Clean (reference)


Wind noise

Noisy CleanUNet (ours) FAIR-denoiser FullSubNet Clean (reference)