Merging 2 videos with overlay causes async problems

1

I'm using following ffmpeg command to merge 2 MKV inputs with the overlay parameter. The result should be one output with input1 on top of input2. In the process, the output should be webm. Both inputs are of the same length (with a margin of a second).

An illustration;

-----------------
|               |
|               |
|   input1.mkv  |
|               |
|---------------|
|               |
|               |
|   input2.mkv  |
|               |
----------------- 

Command and trimmed output:

ffmpeg -i input1.mkv -i input2.mkv -y -filter_complex \
[0:v]select=1, setpts=PTS-STARTPTS, scale=400:300, pad=400:600 [top]; \
[1:v]select=1, setpts=PTS-STARTPTS, scale=400:300 [bottom]; \
[top][bottom] overlay=0:300 [out]; \
[0:a:0][1:a:0] amerge=inputs=2 [a]; \
[a] asetpts=PTS-STARTPTS [a] \
-map [a] -c:v libvpx -crf 10 -b:v 360K -q:v 7 -c:a libvorbis -b:a 32k \
-map [out] output.webm

ffmpeg version 2.4.git Copyright (c) 2000-2014 the FFmpeg developers
  built on Oct 30 2014 14:00:21 with gcc 4.6 (Ubuntu/Linaro 4.6.3-1ubuntu5)
  configuration: --prefix=/home/bla/ffmpeg_build --extra-cflags=-I/home/bla/ffmpeg_build/include --extra-ldflags=-L/home/bla/ffmpeg_build/lib --bindir=/home/bla/bin --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-nonfree --enable-openssl
  libavutil      54. 11.100 / 54. 11.100
  libavcodec     56. 10.100 / 56. 10.100
  libavformat    56. 11.100 / 56. 11.100
  libavdevice    56.  2.100 / 56.  2.100
  libavfilter     5.  2.100 /  5.  2.100
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
Guessed Channel Layout for  Input Stream #0.1 : mono
Input #0, matroska,webm, from '/tmp/input1.mkv':
  Metadata:
    ENCODER         : Lavf54.20.4
  Duration: 00:02:50.45, start: 0.000000, bitrate: 174 kb/s
    Stream #0:0: Video: vp8, yuv420p, 640x480, SAR 1:1 DAR 4:3, 30 fps, 30 tbr, 1k tbn, 1k tbc (default)
    Stream #0:1: Audio: pcm_mulaw ([7][0][0][0] / 0x0007), 8000 Hz, 1 channels, s16, 64 kb/s (default)
Guessed Channel Layout for  Input Stream #1.1 : mono
Input #1, matroska,webm, from '/tmp/input2.mkv':
  Metadata:
    ENCODER         : Lavf54.20.4
  Duration: 00:02:50.46, start: 0.013000, bitrate: 1901 kb/s
    Stream #1:0: Video: vp8, yuv420p, 640x480, SAR 1:1 DAR 4:3, 30 fps, 30 tbr, 1k tbn, 1k tbc (default)
    Stream #1:1: Audio: pcm_mulaw ([7][0][0][0] / 0x0007), 8000 Hz, 1 channels, s16, 64 kb/s (default)
[Parsed_amerge_8 @ 0x325ada0] No channel layout for input 1
[Parsed_amerge_8 @ 0x325ada0] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
[libvpx @ 0x3268aa0] v1.3.0
Output #0, webm, to '/tmp/output.webm':
  Metadata:
    encoder         : Lavf56.11.100
    Stream #0:0: Audio: vorbis (libvorbis), 8000 Hz, stereo, fltp, 32 kb/s (default)
    Metadata:
      encoder         : Lavc56.10.100 libvorbis
    Stream #0:1: Video: vp8 (libvpx), yuv420p, 400x600 [SAR 1:1 DAR 2:3], q=-1--1, 360 kb/s, 30 fps, 1k tbn, 30 tbc (default)
    Metadata:
      encoder         : Lavc56.10.100 libvpx
Stream mapping:
  Stream #0:0 (vp8) -> select
  Stream #0:1 (pcm_mulaw) -> amerge:in0
  Stream #1:0 (vp8) -> select
  Stream #1:1 (pcm_mulaw) -> amerge:in1
  asetpts -> Stream #0:0 (libvorbis)
  overlay -> Stream #0:1 (libvpx)
Press [q] to stop, [?] for help
[vp8 @ 0x322af20] Discarding interframe without a prior keyframe!
Error while decoding stream #0:0: Invalid data found when processing input
[vp8 @ 0x322af20] Discarding interframe without a prior keyframe!
Error while decoding stream #0:0: Invalid data found when processing input
frame=  316 fps= 17 q=0.0 size=     753kB time=00:00:13.53 bitrate= 456.0kbits/s dup=0 drop=146    
[vp8 @ 0x322af20] Upscaling is not implemented. Update your FFmpeg version to the newest one from Git. If the problem still occurs, it means that your file has a feature which has not been implemented.
[vp8 @ 0x322af20] If you want to help, upload a sample of this file to ftp://upload.ffmpeg.org/incoming/ and contact the ffmpeg-devel mailing list. (ffmpeg-devel@ffmpeg.org)
Input stream #0:0 frame changed from size:320x240 fmt:yuv420p to size:384x288 fmt:yuv420p
Input stream #0:0 frame changed from size:384x288 fmt:yuv420p to size:320x240 fmt:yuv420p
Input stream #0:0 frame changed from size:320x240 fmt:yuv420p to size:384x288 fmt:yuv420p
Input stream #0:0 frame changed from size:384x288 fmt:yuv420p to size:512x384 fmt:yuv420p
Input stream #0:0 frame changed from size:512x384 fmt:yuv420p to size:640x480 fmt:yuv420p
[Parsed_amerge_8 @ 0x33462c0] No channel layout for input 1
[Parsed_amerge_8 @ 0x33462c0] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
[libvorbis @ 0x3266fc0] Queue input is backward in time
[webm @ 0x3266200] Non-monotonous DTS in output stream 0:0; previous: 13880, current: 3912; changing to 13880. This may result in incorrect timestamps in the output file.
frame= 2730 fps= 21 q=0.0 size=    6030kB time=00:01:39.33 bitrate= 497.3kbits/s dup=0 drop=1036    
Error while decoding stream #0:1: Cannot allocate memory
    Last message repeated 65 times
frame= 2738 fps= 21 q=0.0 size=    6048kB time=00:01:39.66 bitrate= 497.1kbits/s dup=0 drop=1036    
Error while decoding stream #0:1: Cannot allocate memory
    Last message repeated 170 times
frame= 2784 fps= 21 q=0.0 size=    6230kB time=00:01:53.17 bitrate= 450.9kbits/s dup=0 drop=1403    
Error while decoding stream #1:1: Cannot allocate memory
    Last message repeated 133 times
[webm @ 0x3266200] Non-monotonous DTS in output stream 0:0; previous: 113164, current: 3896; changing to 113164. This may result in incorrect timestamps in the output file.
[webm @ 0x3266200] Non-monotonous DTS in output stream 0:0; previous: 113164, current: 3928; changing to 113164. This may result in incorrect timestamps in the output file.
[webm @ 0x3266200] Non-monotonous DTS in output stream 0:0; previous: 113164, current: 3960; changing to 113164. This may result in incorrect timestamps in the output file.
[webm @ 0x3266200] Non-monotonous DTS in output stream 0:0; previous: 113164, current: 3992; changing to 113164. This may result in incorrect timestamps in the output file.
frame= 2784 fps= 21 q=0.0 Lsize=    6295kB time=00:01:53.17 bitrate= 455.6kbits/s dup=0 drop=1456    
video:5595kB audio:643kB subtitle:0kB other streams:0kB global headers:3kB muxing overhead: 0.898592%

This command does what it's supposed to do.

However, the 2 videos are not totally in sync.

The input1 on the top plays decently while the input2 on the bottom has black frames, slows down or speeds up and causes the audio and video to be out of sync.

To rule out the individual quality of the inputs, we switched the position of the videos and the top video always plays decently.

How can we fix this?

--UPDATE 1-- Running FFMPEG through Node.js fluent-ffmpeg module conceiled all the warnings and errors. I've ran the FFMPEG command in console and the amount of output is gigantic.

Here's an untrimmed pastebin with the log -> http://pastebin.com/bHdC2M1V

--UPDATE 2-- A possible lead: The input mkv files are WebRTC streams being sinked. Please correct me if I'm wrong; In live streams, the quality changes according to the connection. If this means that frames are being sent with different sizes, this could explain the issue of ffmpeg complaining about changing frame sizes. So to rephrase the question; How can 2 mkv input videos (origin WebRTC stream) be combined without dropping frames that change sizes?

Gnagy

Posted 2014-11-04T07:59:31.110

Reputation: 113

2Your input frame sizes are changing, but VP8 does not support this according to the console output. – llogan – 2014-11-04T18:51:47.753

Hi, thank you for your comment! I am having trouble understanding what you mean. Correct me if I'm wrong; does this has to do with the scaling part? And scaling a frame in VP8 is not supported? – Gnagy – 2014-11-05T09:23:34.547

Where does the input come from exactly? Changing frame sizes inside one container is not really adaptive streaming, and I've yet to see this used anywhere in practive. Seems rather unusual. Can you supply a sample video? Can you simply convert input.mkv to output.mp4 without any other options? If decoding fails there as well, then ffmpeg will not be able to decode the file and you need to open a bug report or accept that it's simply not possible (yet). – slhck – 2014-11-06T09:28:09.463

The input is a WebRTC live stream being sinked. Transcoding the mkv to webm or mp4 works. Combining them is causing errors. – Gnagy – 2014-11-06T09:45:19.947

2And when you transcode the original and then combine it, does that work? Another idea might be to force decoding using libvpx, with ffmpeg -c:v libvpx -i input1.mkv -c:v libvpx -i input2.mkv. Right now your command uses the ffmpeg-internal VP8 decoder, which does not support the frame upscaling (it's specified in RFC 6386 but ffmpeg did not implement it). – slhck – 2014-11-06T09:53:01.110

I will try that and come back to you with results. thx! – Gnagy – 2014-11-06T09:55:25.067

Forcing libvpx on the input does not work. However!!!! ... transcoding first and merging second as two sequential processes does work!!! thanks !! If you put your comment in an answer, i'm happy to accept it! – Gnagy – 2014-11-06T11:32:44.157

Nice. Another thing you could try when running the original command is the -avoid_negative_ts 1 option. I'm still unsure as to why the original fails and it may be worth opening a bug report for it, with some video samples. Do you think you can generate some dummy content that one could use for debugging? I could file the bug report for you. – slhck – 2014-11-06T11:58:08.703

I've tried using the -avoid_negative_ts 1 option with no effect. I have uploaded some samples from the testing team, you can find them here: http://we.tl/6FeQYpxEE4. A full output of the log you can find here http://pastebin.com/bHdC2M1V. You can file this as a bug report if you want to. If you need anything else, I'll be happy to support with more details.

– Gnagy – 2014-11-06T12:19:34.677

Answers

2

Based on some trial and error, it seems that it works when you first convert the existing input to some other intermediate file, and then overlay them.

For example, you could use HuffYUV, high quality H.264, lossless H.264, ProRes, etc.:

ffmpeg -i input.mkv -c:v huffyuv -c:a pcm_s16le output.avi
ffmpeg -i input.mkv -c:v libx264 -crf 16 -c:a aac -strict experimental -b:a 320k output.mp4
ffmpeg -i input.mkv -c:v libx264 -crf 0 -c:a aac -strict experimental -b:a 320k output.mp4
ffmpeg -i input.mkv -c:v prores -c:a pcm_s16le output.mov

Then try merging again.

Note that setting -pix_fmt yuv420p or using the format=pix_fmts=yuv420p video filter might be necessary if your original video does not use the YUV 4:2:0 colorspace (which is the case for HuffYUV, lossless H.264 and ProRes).

The original issue is that ffmpeg cannot deal with frame scaling as implemented in VP8 and described in RFC 6386.

slhck

Posted 2014-11-04T07:59:31.110

Reputation: 182 472