Background: I am using FFMPEG to combine an audio and video stream into a combined MPEGTS stream over the network. Video comes encoded in h264 from the raspberry pi camera, audio comes ogg vorbis encoded from a sound card over a local TCP stream. A single FFMPEG process takes care encoding the audio as AAC and combining audio/video.


raspivid --nopreview --ev 10 -ih -t 0 -rot 180 -w 720 -h 480 -fps 30 -b $BITRATE -g $KEYFRAME_PERIOD -pf baseline -o - | ffmpeg -r 30 -i - -i tcp:// -vcodec copy -acodec libfdk_aac -b:a 64k -ac 2 -filter:a "atempo=0.9945" -f mpegts tcp://

Problem: I had a problem with audio and video de-synchronizing over time. I fixed the initial time difference by tweaking the sound card driver myself, however over time the streams can still very slowly run out of sync: the audio plays slightly faster than the video. Letting it run on eventually causes playback buffering to re-sample the audio, resulting in video latency increasing.

The reason for this tempo offset is probably that the sound card runs on its own clock slightly different than that of the raspberry pi. I have managed to reduce this issue using the atempo audio filter. However, considering this may run for hours to days, it is inadequate as the crystal clock speeds will vary.

Question: Considering the audio comes in as an audio stream, I was wondering if there is a way to automatically adjust the tempo to keep the buffered audio sample time constant? Something that makes FFMPEG maintain a steady audio stream, keeping x seconds in the buffer at all times, slowing down playback if catching up on the buffer. The tempo is the key here: if tempo is not changed it will result in problems during playback when audio and video are combined.

I can not use standard synchronization methods because the h264 stream has no timestamps, and the ogg stream timestamps have a tempo problem.

1I'm not positive this will solve it for you, by try adding the -re flag to ffmpeg – szatmary – 2016-12-31T23:45:25.337

Although the docs do say, -re Should not be used with actual grab devices or live input streams (where it can cause packet loss) – Gyan – 2017-01-01T05:08:00.947

If the audio source clock is errant then there's no generic ffmpeg solution possible. Sometimes X incoming samples may represent Y seconds of realtime audio and sometimes Z. What you could try if the audio clock isn't the issue is -af aresample=async=1 instead of atempo. This filter will trim or pad the output audio if there are too many or two few audio packets between two timestamps. – Gyan – 2017-01-01T05:17:41.390

Thank you Mulvya, unfortunately it did not work. Judging by the documentation of the ffmpeg resampler it probably has to do with the lack of timestamps in the h264 video stream. I used -re before when streaming videofiles but it doesn't work for the tcp stream. The closest I could get in theory is occasionally resetting the audio stream to "latest sample", but that too is a bit vague and I suppose ffmpeg indeed has nothing for it. – berger – 2017-01-02T18:10:37.650

