Properly downmix 5.1 to stereo using ffmpeg

27

17

I have a 5.1 audio track from a film where front left and front right contains music, and center contains dialogue. Playing the 5.1 track in VLC blends everything together nicely.

I'm trying to convert the 5.1 track to stereo using ffmpeg -ac 2, however the resulting stereo mix has a much weaker volume than playing the 5.1 track natively.

Adding -af "pan=stereo|c0=FL|c1=FR" gives the correct volume, but then there is no dialogue because the center channel is not included.

So the solution is maybe to mix left/center/right into stereo, and throw out the back end subwoofer channels? (I'm guessing here...)

So the question is: How do I make ffmpeg downmix 5.1 to stereo the same way VLC does it, with the same strong volume in the end result?

forthrin

Posted 2014-12-14T12:23:39.217

Reputation: 1 047

Are you sure VLC is actually playing the additional channels? Downmixing can result in normalization so that the sum of each input per output channel does not result in overload so clipping is prevented. This can make it sound quieter. – llogan – 2014-12-14T18:15:17.157

The basics: My file is 5.1. My speakers are stereo. I don't know what VLC does, but it creates a great end result in my stereo speakers from the 5.1 source data (strong volume, both music and dialogue included). ffmpeg, on the other hand, creates a "low volume" result when using -ac 2. So I'm asking how to make ffmpeg generate the same good result as VLC does. – forthrin – 2014-12-16T10:20:32.587

Answers

29

I found the answer Shane provided to provide too little of the other channels and too much of the center. Movies with headphones sounded off balance, with all dialogue and not enough background music/effects.

According to ATSC standards (section 7.8, page 91), the following formula is used to downmix 5.1 to conventional stereo (as opposed to matrix):

Lo = 1.0 * L + clev * C + slev * Ls ;
Ro = 1.0 * R + clev * C + slev * Rs ;

clev and slev should be .707, according to tables 5.9 and 5.10 in the aforementioned document, assuming a center/surround mix level of 0. Other values are provide in those tables which reduces the amount of center mix, which I don't find useful.

With this in mind, the following ffmpeg option produces a good balanced sound with audible dialogue. Note that specifying the audio channels is not necessary.

-af "pan=stereo|FL < 1.0*FL + 0.707*FC + 0.707*BL|FR < 1.0*FR + 0.707*FC + 0.707*BR"

A note on the use of the less-than symbol, from the pan filter documentation:

If the ‘=’ in a channel specification is replaced by ‘<’, then the gains for that specification will be renormalized so that the total is 1, thus avoiding clipping noise.

Gregory

Posted 2014-12-14T12:23:39.217

Reputation: 399

5

The ATSC standards you've linked here were linked to from the FFmpeg wiki on the topic, so it's unsurprising that the formula used here is the same one implemented by FFmpeg with its ac -2 switch. In other words, the only difference between using this filter and doing ac -2 is a lot more typing.

– Hashim – 2019-02-28T19:19:43.030

1@Hashim Not only typing. An answer with a thorough explanation of the underpinnings is objectively better than "type this to get that". – Sevastyan Savanyuk – 2020-01-04T03:57:44.257

22

The answers on this question have since become of a bit of a mess, with many containing redundant information and others complete inaccuracies. This answer is an attempt to streamline the information in these answers while doing away with the problems in them.

Most importantly, it's worth bearing in mind that Gregory's answer, currently the top-voted answer to this question, is no different than using the -ac 2 switch - more on this below.

Downmixing a 5.1 channel audio stream to stereo with -ac 2

FFmpeg comes with built-in capabilities for downmixing a 5.1 track to stereo, and this is also the solution that FFmpeg's own documentation recommends:

Note: ffmpeg integrates a default down-mix (and up-mix) system that should be preferred (the -ac option) over the pan filter unless you have very specific needs.

The -ac 2 switch works by mixing proportions of the first 5 channels from the source's 6-channel stream - Back Left, Back Right, Front Left, Front Right and Front Center - into the Front Left and Front Right channels of the output stereo stream:

enter image description here

When doing so, audio from the LFE channel (the .1 in 5.1, reserved for the subwoofer and used for deep, low-frequency effects) is discarded completely when using this option.

Unfortunately, in my tests -ac 2 resulted in overall levels of both music and dialogue that were the most different to the source, making it the downmix formula that gives the worst output out of all the formulae I tested, although you may test it and find that it gives you a perfectly adequate downmix for your needs, in which case using any other formula would be overkill for you.


To downmix a DTS track with -ac 2 without transcoding it (i.e. to keep its codec and extension the same):

ffmpeg -i "sourcetrack.dts" -c:a dca -ac 2 "stereotrack.dts"

As pointed out by Mephisto in his answer, if the dialogue and the music sound well-balanced among each other to you but simply lack volume, you can downmix the stream while also increasing its volume:

ffmpeg -i "sourcetrack.dts" -c:a dca -ac 2 -vol 425 "stereotrack.dts"

For the -vol switch, 100% volume in the source is equivalent to the integer value 256, and using a larger value than this will increase the overall volume of the audio stream. However, note that doing so too much may result in distortion or artifacts, especially during its louder sections.

To downmix an audio stream to stereo and transcode it to the AC3 codec, for example:

ffmpeg -i "sourcetrack.dts" -c:a ac3 -ac 2 "stereotrack.ac3"

Downmixing a 5.1 channel audio stream to stereo with a custom mix algorithm

If you want a more high quality downmix, or you absolutely must include the LFE stream into your output, you can use FFmpeg's audio filter switch (-af) to downmix the audio using a custom mix formula.

Downmixing with the ATSC formula (Gregory's answer)

As of the time of posting this answer, the top-voted answer to this question was Gregory's, which puts the formula from the ATSC specification (see section 7.8.2, Downmixing into Two Channels) into an FFmpeg audio filter. This specification is itself directly linked to by the FFmpeg documentation on the topic, indicating it's highly likely to be the same formula that FFmpeg already implements for its -ac 2 switch. If this is true, then typing out the entire formula in Gregory's answer would be no different than using the -ac 2 switch, and therefore a waste of time.

I decided to test this for certain by re-encoding the same input audio using both -ac 2 and the -af filter from Gregory's answer (the exact commands used can be seen in the footnotes to this answer).

I then compared the sizes of the resulting output files and found they were, byte-for-byte, the same size:

enter image description here

Finally, I opened both of the two output files in Audacity, and compared their waveforms to confirm they were identical (click to enlarge):

enter image description here

It therefore seems pretty conclusive that the ATSC formula detailed in Gregory's answer is the same one already implemented by FFmpeg, and that using it is entirely redundant when it does nothing that -ac 2 doesn't, and is a much more cumbersome command.

Downmixing without discarding the LFE channel (Dave_750's answer)

Of the several included in the answers, this is the only one of the downmix formulae that appears to mix the LFE channel into the output stereo instead of discarding it entirely, and as a result, the one that ensures the least sound from the source is lost.

The overall volume level is higher and fuller than doing -ac 2, but also still lower than the below Nightmode Dialogue downmix. However, music levels are much closer to source than the Nightmode Dialogue downmix, and due to inclusion of the LFE track, increasing the volume of the output while using this downmix formula can create an output stream that sounds truer to the 5.1 source than all other formulae I tested.

If you have the ability, I would highly recommend encoding your audio stream(s) using both this downmix formula and the Nightmode Dialogue downmix, and carefully comparing the waveforms of the two to determine which one is better.

To downmix a 5.1 track to stereo using this formula and increase its volume level to 425 (where 256 is 100% of the original source's volume level):

ffmpeg -i "sourcetrack.dts" -c dca -vol 425 -af "pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE" "outputstereo.dts"

Downmixing with Robert Collier's Nightmode Dialogue (Shane Harrelson's answer)

The Nightmode Dialogue formula, created by Robert Collier on the Doom9 forum and sourced by Shane Harrelson in his answer, results in a far better downmix than the -ac 2 switch - instead of overly quiet dialogues, it brings them back to levels that are much closer to the source.

From Robert Collier's description of the mix:

After converting many DTS movie tracks from 5.1 to 2.0 using eac3to, I have found the default eac3to channel mappings to result in very quiet dialogues and overly loud music and action scenes. Although the eac3to channel downmix coefficients have a scientific basis, they often do not sound good in practice bceause of low dialogue volume. This preset is for those looking for clear dialogues with left and right channel music still being audible but more in the background.

As you can see - front center (dialogues) come in properly now and stay at the original level - while the music and explosions remain a background effect and don't overpower you. This preset solves the problem of you having to constantly fiddle with the volume knob when watching DTS 5.1 converted to 2.0 movies in order to hear dialogues. (Especially for watching movies in the night where you don't want to wake others but still want to be able to hear dialogues).

Unfortunately, the music of this downmix formula is much lower than in the 5.1 source (which was likely by design considering Collier's intention to create a "nightmode" mix) and due to complete loss of the LFE track, the overall output audio doesn't sound as full or close to source as Dave_750's formula with boosted volume.

However, if for some reason you want to avoid boosting the overall volume of the stream, then the Nightmode Dialogue would likely be your best option - though again, I would highly recommend encoding your audio stream to both and comparing the waveforms of the two carefully.

To downmix with the Nightmode Dialogue formula in FFmpeg:

ffmpeg -i "sourcetrack.dts" -c dca -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" "stereotrack.dts" 

Tarc's answer

This answer simply puts the Nightmode Dialogue downmix formula from Shane Harrelson's answer into a command to convert the audio stream in an MKV container. While the command given in this answer would work fine on such an audio stream, adapting it for a standalone audio track would give the error:

Filtering and streamcopy cannot be used together

This is because the audio codec cannot be copied when downmixing - like all other changes FFmpeg makes to an output stream, a downmix requires that the track be re-encoded for the changes to be applied.

This command also included a redundant -ac 2 switch which FFmpeg would have ignored.


Test commands

To demonstrate the reliability of the tests I conducted for this answer, below are all of the commands I used to test each downmix formula.

The test command used for the -ac 2 option:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -ac 2 "Audio 1 (-ac 2).wav"

The test command used for Gregory's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -af "pan=stereo|FL < 1.0*FL + 0.707*FC + 0.707*BL|FR < 1.0*FR + 0.707*FC + 0.707*BR" "Audio 2 (ATSC Algorithm Downmix).wav"

The test command used for Dave_750's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -vol 425 -af "pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE" "Audio 4 (Dave750 Downmix).wav"

The test command used for Shane Harrelson's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" "Audio 3 (Nightmode Dialogue Downmix).wav"

Hashim

Posted 2014-12-14T12:23:39.217

Reputation: 6 967

1Impressive insight! Thanks for taking the time to share this. Strange then, that -ac 2 gave me an inferior result to begin with, which prompted the original posting. I will try this again and if possible, share a 5.1 excerpt which doesn't give a satisfactory result with the built-in down-mix. Also very nice to know you can down-mix without transcoding! – forthrin – 2019-03-02T09:11:38.920

@forthrin Bear in mind that encoding and transcoding are two different things. Transcoding converts from one codec/extension to another, and encoding converts to the same codec/extension. You can downmix and apply other FFmpeg effects to a stream without transcoding, but not without encoding. The ac -2 option gave me the most inferior result of all the downmix formulae too, I think this is just a failing of the ATSC standard's formula. – Hashim – 2019-03-02T20:19:17.593

I tried this now. It seems that ffmpeg -i 5.1.mp4 -ac 2 2.mp4 works, but ffplay -i 5.1.mp4 -ac 2 doesn't. – forthrin – 2019-03-11T14:41:44.647

FYI, .wav is totally uncompressed so all these downmixes will have the exact same size down to the byte, regardless. You could have complete silence and it would still be the same size if it was the same length (and sampling rate, bit depth, etc. were also identical) – NullUser – 2020-02-11T04:00:42.750

9

Try this downmix:

-ac 2 -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" 

as suggested by Robert Collier in the Doom9 forum.

Shane Harrelson

Posted 2014-12-14T12:23:39.217

Reputation: 117

2What do all those options mean? If you explain them, people will be able to use your answer to solve different problems instead of just copy-pasting. – David Richerby – 2016-03-04T21:12:58.453

2@DavidRicherby -ac = Audio Channels (2 for stereo), -af = Audio Filter – Cestarian – 2016-03-23T04:14:09.510

3Tried this for a 5.1 movie and at least the output stereo sounded completely fine to me. Clear dialogue and nothing else seemed to be missing. Would be great if someone with VLC knowledge could share exactly what is done in the default 5.1 to 2.0 downmix there. – forthrin – 2016-07-08T10:28:46.480

2@DavidRicherby: The options inside the audio filter (-af) are: FL=Front-left; BL=Back-left; FC=Front-center; FR=Front-right; BR=Back-right. The floats are linear factors to reduce (<1) or increase(>1) the volume of the multiplied channel. FL=FC+0.30FL+0.30BL is setting the Front-left channel to the Front-Center channel plus 30% of the Front-left and 30% of the Back-left channels. – kronenpj – 2017-01-15T22:13:52.390

1FWIW: I find this mix make dialogues be way too loud compared to the music and ambient sounds. The technically more correct mix given in Tarc's answer is much more pleasing to me. So I guess you might have to try what works best for you, it depends on the situation. – jlh – 2018-02-07T22:12:48.290

It may have been edited, @jlh: but the filter settings are identical in both answers. There's no reason they should sound differently to you. – psouza4 – 2018-05-06T18:48:26.590

4

So, by combining @Shane Harrelson's with @Jordan Harris's answer to another question - with lazy mode turned on - here what's needed to convert input_51.mkv (5.1) into output_stereo.mkv (stereo):

ffmpeg -i input_51.mkv -c:v copy \
    -ac 2 -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" \
    output_stereo.mkv

The -c:v copy part means that the video stream is not being touched (I guess that the video codec settings is being copied). Without it, it will take much longer. Just repeating from the above answer for completeness, -ac 2 means two audio channels and -af specifies an audio filter.

After looking into the command a bit, I figured out that it's setting how the two stereo channels are composed; the FL (front left channel) is taken from the original FC (front center) plus 0.30*FL (30% from the front left) plus 0.30*BL (30% from the back left) and so on.

Tarc

Posted 2014-12-14T12:23:39.217

Reputation: 161

Will this keep the center channel consistent and audible? – Freedo – 2017-08-07T10:54:32.330

2

This is an old question now, but pointed me in the right direction and wanted to share my result:

-af "pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE"

Putting half of the FC and LFE into left and right gives a total of 1 for their effective volumes from both speakers. Using .707 * Front/Back Left/Right brings those channels down to a good level so they don't overpower the center.

Dave_750

Posted 2014-12-14T12:23:39.217

Reputation: 121

1

If the -ac 2 option gives you a balanced downmix where neither the music nor the speech sounds too much more than the other components, you just need to boost the volume with

-vol 512

I used 512 in the example, which increases the sound making it two times louder. The rule is that 256 is equivalent to 100%

Do not go too high with the value, and be sure to check the results in those parts of the movie with explosions or loud noise. Is is very easy to introduce distorsion by using a too high value.

Mephisto

Posted 2014-12-14T12:23:39.217

Reputation: 243

0

After reading this whole page and some experiments I came up with this script called "down_mix":

#!/bin/bash -x

FL="0.5*FC + 0.707*FL + 0.707*BL + 0.5*LFE"
FR="0.5*FC + 0.707*FR + 0.707*BR + 0.5*LFE"
AUDIO_FMT="libopus"
CONTAINER="mkv"

ffmpeg -i "$1" -c:v copy -c:s copy \
    -c:a $AUDIO_FMT \
    -af "pan=stereo|FL=$FL|FR=$FR" \
    "${1%.*}"_dm.$CONTAINER

    # how to test a snippet of movie
    # -ss 41:07.0 -t 4 \

Tweak the variables above to your liking. I didn't have a problem with low volume so left that out, but easily added.

Gringo Suave

Posted 2014-12-14T12:23:39.217

Reputation: 932

0

The ffmpeg filter "-ac 2" works fine as long as your target is pcm_s16le encoded. When encoding to pcm_f32le in wav format the volume is increased by 9dB and more. Hence: Don't use the "-ac 2" filter in such cases.

Frank-Michael Fischer

Posted 2014-12-14T12:23:39.217

Reputation: 1

1Why is the volume increased? Where did you learn about this? – forthrin – 2019-04-24T16:19:20.620

No idea, why. But I am a very frequent ffmpeg user (compiling it myself). Just take any 5.1(side) source and convert it to a pcm_s16le und also to a pcm_f32le wav file using the "-ac 2" both times. Compare the peak volumes of the two wav files and you will see (and hear): – Frank-Michael Fischer – 2019-04-25T10:07:07.290

this happens using e.g. ffmpeg version N-93636-g6829c3c – Frank-Michael Fischer – 2019-04-25T15:27:37.240

0

-ac 2

The volume of channels in downmix is unchanged with floating point codec -> pcm_f32le, aac

The volume in downmix (5.1 to 2.0 without LFE) is reduced by 1/2.5 = -7.96 dB with integer codec -> pcm_s16le, libfdk_aac

Movies have sound pointed in one direction, and no max sound pressure at all channels. So reduced downmix volume is wrong, little level compression is the right way. That's what Dolby does.

user1076138

Posted 2014-12-14T12:23:39.217

Reputation: 9