Reduce background noise and optimize the speech from an audio clip using ffmpeg

I extract audio clips from a video file for speech recognition. These videos come from mobile/other handmade devices and hence contain a lot of noise. I want to reduce the background noise of the audio so that the speech that I relay to my speech recognition engine is clear. I am using ffmpeg to do all of this stuff, but am stuck at the noise reduction phase.

Till now I have tried following filters:

ffmpeg-20140324-git-63dbba6-win64-static\bin>ffmpeg -i i nput.wav -filter_complex "highpass=f=400,lowpass=f=1800" out2.wav

ffmpeg -i i nput.wav -af "equalizer=f=1000:width_type=h:width=900:g=-10" output.wav

ffmpeg -i i nput.wav -af "bandreject=f=1200:width_type=h:width=900:g=-10" output.wav

But the results are very disappointing. My reasoning was that since speech comes under 300-3000 hz range I can filter out all other frequencies to suppress any background noise. What am I missing?

Also, I read about weiner filters that could be used for speech enhancements and found this but am not sure how to use it.

Sudh

Posted 2014-03-24T21:43:42.603

Reputation: 443

Answers

If you are looking to isolate audible speech try combining a lowpass filter with a high pass filter. For usable audio I have noticed that filtering out 200hz and below then filter out 3000hz and above does a pretty good job of keeping usable voice audio.

ffmpeg -i <input_file> -af "highpass=f=200, lowpass=f=3000" <output_file>

In this example add the high pass filter first to cut the lower frequencies then use the low pass filter to cut the higher frequencies. If needed you could run your file through this more than once to clean up higher db frequencies within the cut frequency ranges.

av8r

Posted 2014-03-24T21:43:42.603

Reputation: 489

Sorry, but this seems to do no noticeable noise reduction for me. – Angad – 2015-10-14T08:03:50.347

This works very well to reduce low level of background noise (fans, buzzing, etc) but may compromise the audio quality slightly, though that can be mitigated somewhat by applying other filters afterwards. – Iain Collins – 2016-11-16T05:01:42.223

3For my case the original audio was so bad it was almost impossible to hear the voice because of some water fall noice in the background. I used the following. It is not great quality, but 1000x better than the original. -af "highpass=f=200, lowpass=f=1000" – Eric – 2017-03-09T22:51:24.393

I get some error with the above or rather, warning from ffmpeg: [Parsed_highpass_0 @ 0x1524780] clipping 52 times. Please reduce gain. – shevy – 2018-01-26T14:10:45.497

6You may preview your filter with ffplay <input file> -af lowpass=3000,highpass=200 – Björn – 2018-03-30T20:52:01.377

This increased my file size from 8 MiB to 70 MiB. The file format is flv. – erandros – 2018-12-05T18:21:36.253

FFmpeg now has 2 native filters to deal with noise background:

Also, since some time, one can use ladspa (look for noise-supressor) and/or lv2 (look for speech denoiser) filters with FFmpeg.

Paul B. Mahol

Posted 2014-03-24T21:43:42.603

Reputation: 560

Update: FFmpeg recently added afftdn which uses the noise threshold per-FFT-bin method described below, with various options for adapting / figuring out appropriate threshold values on the fly.

anlmdn (non-local means) is a technique that works well for video; I haven't tried the audio filter.

Either of these should be much better than highpass / lowpass, unless your only noise is a 60Hz hum or something. (Human speech can still sound ok in a pretty narrow bandpass, but there are much better ways to clean up a broadband noise background hiss.)

ffmpeg doesn't have any decent audio filters for noise-reduction built in. Audacity has a fairly effective NR filter, but it's designed to be used with 2-pass operation with a sample of just the noise, and then the input.

The comments at the top of https://github.com/audacity/audacity/blob/master/src/effects/NoiseReduction.cpp explain how it works. (basically: suppress every FFT bin that's below the threshold. So it only lets signals through when they're louder than the noise floor in that frequency band. It can do amazing things without causing problem. It's like a band-pass filter that adapts to the signal. Since the energy of the noise is spread over the whole spectrum, only letting through a few narrow bands of it will reduce the total noise energy a LOT.

See also Audio noise reduction: how does audacity compare to other options? for more details of how it works, and that thresholding FFT bins in one way or another is the basis of typical commercial noise-reduction filters, too.

Porting that filter to ffmpeg would be a bit awkward. Maybe implementing it as a filter with 2 inputs, instead of a 2-pass filter, would work best. Since it only needs a few seconds to get a noise profile, it's not like it has to read through the whole file. And you SHOULDN'T feed it the whole audio stream as a noise sample, anyway. It needs to see a sample of JUST noise to set thresholds for each FFT bin.

So yeah, a 2nd input, rather than 2pass, would make sense. But that makes it a lot less easy to use than most ffmpeg filters. You'd need a bunch of voodoo with stream split / time-range extract. And of course you need manual intervention, unless you have a noise sample in a separate file that will be appropriate for multiple input files. (one noise sample from the same mic / setup should be fine for all clips from that setup.)

Peter Cordes

Posted 2014-03-24T21:43:42.603

Reputation: 3 141