Ensure audio and video track are EXACTLY the same length

I generate 200 video files based on audio files generated using sox, combined with image files. Most clips are shorter than one second, none is longer than 6. I then concatenate these files and there is an overall delay of about 2 seconds in the end result.

I believe this might be due to audio and video tracks being concatenated independently.

I can find out the exact duration of the video and audio track (stream) using ffprobe. In one of the short files alone I can see that the durations differ:

ffprobe file001.webm
Input #0, matroska,webm, from 'file001.webm':
  Metadata:
    ENCODER         : Lavf58.20.100
  Duration: 00:00:00.92, start: 0.000000, bitrate: 211 kb/s
    Stream #0:0: Video: vp8, yuv420p, 1100x140, SAR 1:1 DAR 55:7, 25 fps, 25 tbr, 1k tbn, 1k tbc (default)
    Metadata:
      ENCODER         : Lavc58.35.100 libvpx
      DURATION        : 00:00:00.923000000
    Stream #0:1: Audio: vorbis, 48000 Hz, stereo, fltp (default)
    Metadata:
      ENCODER         : Lavc58.35.100 libvorbis
      DURATION        : 00:00:00.908000000

How can I make it so that video and audio tracks in one video file are absolutely exactly the same duration?

I'm using vpx/vorbix/webm (after not being able to understand cause of issues with mpeg2ts) but I will use any format to get it done.

I can also add silence padding to the audio to make them match duration.

qubodup

Posted 2019-05-24T12:35:27.703

Reputation: 3 736

Answers

It’s basically not worth the effort. Audio frames are a fixed duration, depending on the codec, and sample rate. For example aac is 1024/sample rate. E.g. 1024/48000 ~ 21.333 ms. So if you resample your video to a perfect multiple of that, it would theoretically be exact. Assuming the container does not modify it at all. Otherwise you can modify the audio encoder to control the number of priming samples used, which would enable you to get a partial first audio frame. But again every codecs different. Else you can use VFR, and manually set the final frame duration if the container supports it. Finally, you can change the edit list in mp4 and use a player that can guarantee support.

I don’t know if any tools that can do any of these things off the shelf.

szatmary

Posted 2019-05-24T12:35:27.703

Reputation: 2 181

would be better suited as a comment, since it's not an answer. I just added some more info to clarify that I can add padding to the audio to achieve the goal (the audio duration is not something that needs to be preserved, it's just that the montage of concatenated videos needs to be in sync) – qubodup – 2019-05-24T18:32:23.370

A comment count not hold that much text. Also audio "padding" won't work because it still need to encode entire audio frames. The only thing possible would be to abuse audio "priming" But I don't know any implementations that support that. – szatmary – 2019-05-24T18:34:48.037