FFmpeg raw audio and H264 in RTSP

Trying to grab correctly video and audio data from an IP camera Hikvision.

Everything works like a charm when doing so for H264 + MP2 for example.

When trying to grab RAW audio in PCM s16le - smile goes off of my face.

Here is how I grab my camera (you can try it is opened to the world):

ffmpeg -re -acodec pcm_s16le -ac 1 -rtsp_transport tcp -i rtsp://superuser:superuser12345@91.214.203.250:10554 -vcodec copy -acodec libfdk_aac -vbr 5 test.ts

The command works and packs RTSP stream to a TS file.

However the duration of audio and video is different. For an example, I am recording 21 sec, from that I have 21 sec of Audio and 15 of Video.

The audio is being stretched and pitch is lowered. Have spent several days reading FFmpeg documentation and applied various options like async, changing sample rate and so on - no luck.

I hope Mulvya or other FFmpeg experts will advice me a FIX to get things done correctly.

C:\Users\User>d:/ffmpeg/bin/ffmpeg -y -re -acodec pcm_s16le -rtsp_transport 
tcp -i rtsp://superuser:superuser12345@91.214.203.250:10554 -vcodec copy -
acodec aac -b:a 96k d:/ffmpeg/hik_aac.ts
ffmpeg version N-83410-gb1e2192 Copyright (c) 2000-2017 the FFmpeg 
developers
built with gcc 5.4.0 (GCC)
configuration: --enable-gpl --enable-version3 --enable-cuda --enable-cuvid -
-enable-d3d11va --enable-dxva2 --enable-libmfx --enable-nvenc --enable-
avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls 
--enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-
libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-
libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb -
-enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --
enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --
enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab -
-enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-
libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-
libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --
enable-zlib
libavutil      55. 46.100 / 55. 46.100
libavcodec     57. 75.100 / 57. 75.100
libavformat    57. 66.101 / 57. 66.101
libavdevice    57.  2.100 / 57.  2.100
libavfilter     6. 72.100 /  6. 72.100
libswscale      4.  3.101 /  4.  3.101
libswresample   2.  4.100 /  2.  4.100
libpostproc    54.  2.100 / 54.  2.100
Guessed Channel Layout for Input Stream #0.1 : mono
Input #0, rtsp, from 'rtsp://superuser:superuser12345@91.214.203.250:10554':
Metadata:
title           : Media Presentation
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, 16 fps, 25 
tbr, 90k tbn, 32.01 tbc
Stream #0:1: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Output #0, mpegts, to 'd:/ffmpeg/hik_aac.ts':
Metadata:
title           : Media Presentation
encoder         : Lavf57.66.101
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, q=2-31, 16 
fps, 25 tbr, 90k tbn, 90k tbc
Stream #0:1: Audio: aac (LC), 16000 Hz, mono, fltp, 96 kb/s
Metadata:
  encoder         : Lavc57.75.100 aac
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33976, current: 7200; changing to 33977. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33977, current: 14400; changing to 33978. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33978, current: 18000; changing to 33979. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33979, current: 25200; changing to 33980. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33980, current: 28800; changing to 33981. This may result in 
incorrect timestamps in the output file.
frame=   85 fps= 11 q=-1.0 Lsize=    1357kB time=00:00:07.42 
bitrate=1497.1kbits/s speed=0.997x
video:1196kB audio:51kB subtitle:0kB other streams:0kB global headers:0kB 
muxing overhead: 8.805858%
aac @ 00000000030a0a00] Qavg: 63342.980
Exiting normally, received signal 2.

Max Ridman

Posted 2017-08-21T12:30:43.627

Reputation: 23

1Need to see full log. – Gyan – 2017-08-21T16:43:03.413

Thank you for your attention. You are free to try if needed (open to the world): ffmpeg -y -re -acodec pcm_s16le -rtsp_transport tcp -i rtsp://superuser:superuser12345@91.214.203.250:10554 -vcodec copy -acodec aac -b:a 96k – Max Ridman – 2017-08-21T18:43:25.977

As far as I can tell the audio stream from the camera itself is already like that (if you disable video with -vn, it's too slow). Could it be that it uses the wrong indicated sample rate? Can you change the parameters of that webcam's encoding? – slhck – 2017-08-21T20:10:10.270

I have tried that some days ago - same result. But when playing that in Web interface of the IP camera - it works OK. Ffmpeg has options to stretch /squeeze audio stream based on video timestamps, but that seems not to work. – Max Ridman – 2017-08-22T05:15:01.347

And the bad thing - I can NOT indicate the sample rate on input RAW audio. That would be useful as far as some devices have wrong header (aka header says the sample rate is 16kHz but in real it is 22.05khz) and you can't do anything about that. – Max Ridman – 2017-08-22T05:17:41.973

Thank you Mulvya for this great suggestion! It has resolved the issue, but I have small drift between audio and video ~300 ms, can I sync them based on video stream? – Max Ridman – 2017-08-22T07:49:34.620

First try: add -vsync 0 to the command. If that doesn't fix it, after you save the capture, run a 2nd command: ffmpeg -i test.mp4 -itsoffset -0.300 -i test.mp4 -c copy -map 0:v -map 1:a test2.mp4. This shifts the captured audio 300 ms earlier. Adjust value as needed. – Gyan – 2017-08-22T08:13:35.470

FFmpeg raw audio and H264 in RTSP

Answers