Flac to mp3 conversion offsets sound by a few ms

0

When running this simple conversion command: ffmpeg -i fileA.flac fileB.mp3 , the mp3 output is offset by about 5 ms.

This does not happen if I try to convert to Vorbis/Ogg (i.e ffmpeg -i fileA.flac fileC.ogg)

Audacity screenshot:

Audacity screenshot showing offset

Any idea why this happens and how I can fix it?

Prime_Aqasix

Posted 2019-01-13T08:25:41.567

Reputation: 1 292

Answers

3

This is because of how MP3 encoding (or actually, both encoding and decoding) works. See the technical FAQ:

Why is a decoded MP3 longer than the original .wav file?

Because LAME (and all other MDCT based encoders) add padding to the beginning and end of each song. LAME embeds the amount of padding in the ancillary data of the first frame of the MP3 file. (LAME INFO tag).

Continuing:

All decoders I have tested introduce a delay of 528 samples. That is, after decoding an mp3 file, the output will have 528 samples of 0's appended to the front. This is because the standard MDCT/filterbank routines used by the ISO have a 528 sample delay. It would be possible to write a MDCT/filterbank routine with a 0 sample delay (see description of Takehiro's MDCT/filterbank routine used in LAME encoding below) but I dont know that anyone has done this. Furthermore, because of the overlapped nature of MDCT frames, the first half of the first granule (1 granule=576 samples) doesn't have a previous frame to overlap with, resulting in attenuation of the first N samples.

It gets more technical if you read on, but this should summarize the issue.

slhck

Posted 2019-01-13T08:25:41.567

Reputation: 182 472

Is there a way to cut out the 528 sample delay directly through ffmpeg? – Prime_Aqasix – 2019-01-14T01:28:37.227

Ok, it seems like ffmpeg -ss 0.0528 -i fileA.flac fileB.mp3 seem to do the trick, though I don't understand why, since the file uses a sample rate of 44100Hz, shouldn't 528 samples take 0.0119 seconds? – Prime_Aqasix – 2019-01-14T01:42:57.000

-ss 0.0528 means there's an offset of 0.0528 seconds, not samples. But you can trim samples like so: https://stackoverflow.com/a/39809030/435093 – slhck – 2019-01-14T13:49:01.767