Specifying parameters to create videos for ffmpeg's concat demuxer (to avoid a large re-encode)

ffmpeg can be used to concatenate files together:

If you have media files with exactly the same codec and codec parameters you can concatenate them [...]

(emphasis mine) My intention¹ is to produce media files with the same codec and parameters so that I can take advantage of concat without incurring a long re-encode.

Preamble:

I have a file I would like to cut and keep useful parts from. I have written a python script to find the nearest keyframe to the desired cut point, and cut there, since when doing a stream copy ffmpeg can only use I-frames:

Using -ss as input option together with -c:v copy might not be accurate since ffmpeg is forced to only use/split on i-frames.

As it happens, the splits aren't happening at exactly the right moment, but are close enough for the moment that I can focus on another part of the equation. If I use the concat demuxer at this point, the different parts get joined together perfectly- so far so good!

However, I would like there to be smooth transitions between these segments, so I have further split these segments so that the short ends can be used to create a crossfade transition without re-encoding the entire set of files.

A basic diagram would probably help illustrate this:

  [111AAAA111BBBBB111111CCCCCCC1111DDDDD111]   | (original file)
     [AAAA] [BBBBB]    [CCCCCCC]  [DDDDD]      | (desired clips extracted)
[AAA] [A][B] [BBB] [B][C] [CCCCC] [C][D] [DDDD]| (split ends from clips)
      [AAA][ab][BBB][bc][CCCCC][cd][DDD]       | (transitions between short ends)
            [AAAabBBBbcCCCCCcdDDD]             | (intended output)

Problem:

This is where I've gotten to. When I used ffmpeg's concat demuxer to join the clips above I get significant video and audio artifacts on playback. My guess is there is a mismatch in codec parameters, as noted as a prerequisite way up at the top of this question. So, checking the video with ffprobe gives:

$ ffprobe -i ab-transition.mkv 2>&1 | grep Stream.*Video ; ffprobe -i B.mkv 2>&1 | grep Stream.*Video
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709/bt709/iec61966-2-1), 1280x720, SAR 1:1 DAR 16:9, 62.50 fps, 62.50 tbr, 1k tbn, 120 tbc (default)
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709/bt709/iec61966-2-1), 1280x720 [SAR 1:1 DAR 16:9], 62.50 fps, 62.50 tbr, 1k tbn, 125 tbc (default)

(I have omitted audio stream output as the streams have ostensibly the same parameters, yet the audio is not joined correctly)

There are differences. I used the -show_streams to get more detailed info, which are available at http://pastebin.com/4vcnDYtj (single blank line separating 2 outputs). diffing the output gives:

7c7
< codec_time_base=1/120
---
> codec_time_base=1/125
70,71c70,71
< start_pts=12
< start_time=0.012000
---
> start_pts=11
> start_time=0.011000

Update:

I have found options and matched parameters for everything that I can see except the codec time base (tbc). Is there a setting which will allow me to set codec_time_base (tbc)? Setting -r has no effect.

Update 2: Fearing this question was too niche for SU, I asked the question of the ffmpeg-user mailing list. Unfortunately -time_base is not an appropriate encoder option in this case:

This is an option for FFmpeg-internal encoders that you try to use for an external encoder (x264).

And more unfortunately, when I asked about general feasibility, the reply was

I don't think this is possible.

I have asked for clarification and possibilities surrounding the original encoding software - in this case OBS - which is potentially less flexible in option specification than ffmpeg due to having to match live stream consumer (Twitch) format specifications. I've yet to receive a reply from the mailing list, but have asked in the OBS forums as well.

More crucially, will controlling for these allow me to use the concat demuxer in ffmpeg to join these together without the need for a long encode process? Many thanks in advance.

_{(I realise this is a wall-of-text-and-a-half, so additions, subtractions or clarification suggestions are welcome of course. I would link to more official info but being <10 rep I cannot include more than 2 links!)}

1: For more context, see my related question: How to efficiently and automatedly join video clips using short transitions?

bertieb

Posted 2015-06-24T11:49:18.070

Reputation: 6 181

1Not giving up on this yet. ;-) I realize this is against your intentions, but have you tried the final concat operation with encoders instead of codec copies to see if that eliminated the a/v artifacts? At the very least, could you please post the full ffmpeg commands you're using to generate your clips and transitions? – Mr. What – 2015-06-30T23:50:47.660

@Mr.What Glad to hear it- I've been asking questions far and wide (as per the update). Using codec options will result in a final transcode; but should eliminate the artifacts as you say. I'm pretty sure I tested this in the many hours I've spent! I can post commands for sure, but the transitions are being generated via melt (MLT) as per my other (linked) question. No-one I've asked seems to be able to modify timebase; even if I tackle it from the source (OBS) side, the x264 option -time_base there is ignored! – bertieb – 2015-07-01T00:46:16.957

Digging further into the documentation, there is a settb filter which might be worth trying since it would be applied outside of the x264 encoder -- ffmpeg input.ext -vf "settb=expr=1/125" -c:v libx264, etc. output.ext. Or has this already been mentioned/tried in your travels thus far?

– Mr. What – 2015-07-01T02:33:41.963

@Mr.What That filter looked promising, but transcoding a file with timebase 125 couldn't produce the desired timebase of 120 (or any other), even though it complained greatly about "Past duration 0.874992 (etc) too large" :-/ Good thinking though! The plot further thickens as via hand-counting, the original files seem to have 60 frames per second when played (not 62-63). Not sure if this is because of the timebase causing player to ignore frames though. Going deep down the rabbit hole here! – bertieb – 2015-07-02T13:13:17.333

Well, the oddball FPS and timebase fall in line with codec copying to output for your non-transitional clips, so there's no real surprise there. I'm wondering if including -copyinkf (which will include leading non-keyframes in -c copy output) might aid in the final concat, and also if it would be worth it to force the encoded transitional clip through at a rational framerate through libx264. Just as an experiment, try recreating your "a" and "b" clips using -copyinkf -c:v copy -c:a copy, the trans. clip using libx264 -r 60, then concatting those three and see if there's a difference. – Mr. What – 2015-07-03T00:49:23.117

@bertieb Have you found your answer yet? Or is it impossible! I'm facing a similar problem where due to resizing one of the clips end up needing to re-encode all of them just to concatenate them properly. – Rohan – 2015-08-31T22:01:31.827

@Rohan yes and no. I worked around it by changing the transitions I was creating - fade-to-black as opposed to crossfade - but this is avoiding the issue, rather than dealing with it. I found the parameter in ffmpeg that controlled the reported timebase (slightly different from -time_base answer below!) but it didn't have the right effect in any case :-/ – bertieb – 2015-08-31T22:05:44.640