Precise cutting video into lots of smaller videos using ffmpeg

2

1

I have large video, in the range of about 45 minutes. I want to perform a lot of cuts (~500 per 20 minutes) that need to be very accurate. Because of this, it's annoying to use the frame-perfect solutions I found - they are extremely slow.

Why would I want to cut video into that many videos? I have a program that recognizes where is silence. I want to speed up to video, at different rates when there is silence and where there are people talking. So I find out which parts of the video are silent/loud, cut it into those parts, speed them up and concatenate them again together.

Right now, after some iterations, I'm using this:

ffmpeg -i [input_video] -ss [seconds_to_start_cut] -frames:v [number_of_frames] -f [input_video_extension] [output_name]

In my program, I have silent/loud parts of video defined as starting and ending frames and I calculate seconds to start cut via FPS received from ffprobe.

Using, the cuts are pretty precise, while there is still some work to be done (the audio repeats a little bit on cut transitions) - maybe FPS slightly vary through the video?

But the problem is, this approach is extremely slow. As far as I understand how ffmpeg works, this starts counting seconds from start each time I call it, repeating previously done work unnecessarily. It's really bad with 16 seconds long video, let alone more than hour long ones.

Is there a reasonably fast way to precisely perform large amount of cuts? These cuts don't overlap, so I technically need to split the video into a lot of shorter videos. If this is not possible to do with ffmpeg, could you recommend me another tool to use? Thanks.

Edit: Thanks to link provided by @slhck, I used complex filter to do it. It has the best results in terms of quality, however it takes about double the length of the video (0.428x) to process it. For example for segments [0-0.25, 2][0.25-0.75,1][0.75-0.1,2] ([time_from]-[time_to], [speed]), I use this filter:

[0:v]trim=0:0.25,setpts=0.5*(PTS_STARTPTS)[v1];
[0:a]atrim=0:0.25,asetpts=PTS-STARTPTS,atempo=2[a1];
[0:v]trim=0.25:0.75,setpts=1*(PTS_STARTPTS)[v2];
[0:a]atrim=0.25:0.75,asetpts=PTS-STARTPTS,atempo=1[a3];
[0:v]trim=0.75:1,setpts=0.5*(PTS_STARTPTS)[v3];
[0:a]atrim=0.75:1,asetpts=PTS-STARTPTS,atempo=2[a3];
[v1][a1][v2][a2][v3][a3]concat=n=3:v=1:a=1

It starts to look funny with hundreds and hundreds of segments, but it actually works really good!

I currently run ffmpeg like this:

ffmpeg -i madoka.mp4 -filter_complex "[filter]" -f mp4 -movflags frag_keyframe+empty_moov output.mp4

Log:

ffmpeg version n4.2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9.1.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-nvdec --enable-nvenc --enable-omx --enable-shared --enable-version3
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'madoka.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2016-12-17T08:09:58.000000Z
  Duration: 00:24:09.99, start: 0.000000, bitrate: 1676 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720 [SAR 1:1 DAR 16:9], 1482 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 192 kb/s (default)
    Metadata:
      creation_time   : 2016-12-17T08:10:39.000000Z
      handler_name    : IsoMedia File Produced by Google, 5-11-2011
Stream mapping:
  Stream #0:0 (h264) -> trim
###### (523 more lines like this) #####
  Stream #0:1 (aac) -> atrim
###### (523 more lines like this) #####
  concat:out:v0 -> Stream #0:0 (libx264)
  concat:out:a0 -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
[libx264 @ 0x5604a9dc2600] using SAR=1/1
[libx264 @ 0x5604a9dc2600] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x5604a9dc2600] profile High, level 3.1, 4:2:0, 8-bit
[libx264 @ 0x5604a9dc2600] 264 - core 157 r2945 72db437 - H.264/MPEG-4 AVC codec - Copyleft 2003-2018 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=23 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'madoka.new.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    encoder         : Lavf58.29.100
    Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 1280x720 [SAR 1:1 DAR 16:9], q=-1--1, 23.98 fps, 24k tbn, 23.98 tbc (default)
    Metadata:
      encoder         : Lavc58.54.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      encoder         : Lavc58.54.100 aac
frame=    3 fps=0.0 q=0.0 size=       0kB time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    5 fps=4.9 q=0.0 size=       0kB time=00:00:00.02 bitrate=  16.5kbits/s dup=0 drop=9 speed=0.0226x      
(...)

SoptikHa

Posted 2019-08-18T15:27:41.490

Reputation: 123

Cheap version: put -ss before -i and the seeking will be frame-accurate and (almost) immediate. There are other solutions though where you could put everything into one (rather complex) filter, using ffmpeg's silencedetect filter and doing the speedup there, without cutting. – slhck – 2019-08-18T15:32:12.343

I had this before and it was fast, but it wasn't nice when transitioning between cuts - the audio (i didn't notice video, but it's possible) did repeat a little bit from end of previous cut. Might it be because of my use of -frames:v [frames]? I need to look into silencedetect, that might solve this whole problem. – SoptikHa – 2019-08-18T15:34:41.123

Actually, AFAICT, there is no programmatic way to use the output of silencedetect directly in a filter chain, so you have to work with split points. But you could, in principle, build a more complex filter that takes the original video and does the speeding up based on a list of timestamps, using the setpts filter, see e.g. https://video.stackexchange.com/a/21804/525

– slhck – 2019-08-18T15:39:58.607

Thanks for this idea! I just built the complex filter, and it works great, the both audio and video quality is better than with anything else I've tried. The inconvenience is that it takes about twice the video length (0.428x) to process it, but I guess I can't have both speed and good results. I also found out very fast version which converts everything to .mpeg before processing and does some magic with it (hundreds of temp files), but the resolution and audio quality significantly degrades in that case. – SoptikHa – 2019-08-18T17:47:59.763

@slhck do you have an idea if it's possible to speed it up a bit? If I could get it work about 2x faster, I should be able to pipe it into video player immediately without having to process the video before playing. It would be great if you could recommend me some flags or options to look up. – SoptikHa – 2019-08-19T07:59:14.700

1Please show the actual command options you are using, and the full log of ffmpeg when running that command. That way, we can see what input format and encoding options you are using. You may not be able to achieve realtime speed—it's a tradeoff between quality and encoding efficiency. – slhck – 2019-08-19T08:02:37.140

I've edited post and added log. I've deleted duplicate lines from the log - such as the stream mapping lines - there were more than 500 lines, exactly the same. I also didn't include the progress (all the frame = x fps = y size = z ... lines) and the very end. Please let me know if I left out anything important, and thank you for your help. – SoptikHa – 2019-08-19T16:44:03.423

Your video is 720p and you're encoding with libx264 to H.264. Check here for some tips on how to speed up encoding: https://trac.ffmpeg.org/wiki/Encode/H.264 — particularly the FAQ. You can use -preset faster to speed up encoding, but the output will be larger. If however the filter chain is already consuming too much CPU, you won't notice the speedup. Let me know if that helps; I don't think there's much else you can do. I can then summarize in a real answer below (comments aren't so useful for that kind of stuff).

– slhck – 2019-08-19T17:09:10.070

So in the end I ended up with ffmpeg -i madoka.mp4 -preset ultrafast -filter_complex "$complex" -threads 8 -f mp4 -crf 51 -movflags frag_keyframe+empty_moov -. I gradually added various parameters and nothing changed processed fps that much, I'm still lucky to get 10FPS. However I noticed, that 8 processes (I have 8 cores) are spawned, one has 100% cpu usage, others around 5%. Is there a way to utilize multiple cores? Maybe different video format (but the transcoding would likely introduce big delay, right)? – SoptikHa – 2019-08-19T18:43:43.540

CRF 51 will give you really bad video quality. Have you watched the output? Usually you want something between 23–28. 10 FPS isn't too bad. Mind you that you are already transcoding. By default ffmpeg will use all CPUs/threads available. Some filters cannot be multithreaded. – slhck – 2019-08-21T12:16:59.157

I started with CRF 28 and slowly gone up, and waited for something to change (it didn't). Thank you for your help, and your time. You really helped me a lot. I'm developing an application that takes a video (a lecture), detects silent parts where nothing happens and speeds them up (this resulted in up to 60% time saved), so you really helped me a lot. If you want to, summarize what happened in comments, and I'll of course accept your answer. – SoptikHa – 2019-08-21T12:42:41.743

I am surprised by CRF 51 not yielding bad video quality, since usually that'd be the worst you can get. Also, choosing a lower CRF will not make the encoding so much faster. But I've provided an answer that sums up the main points. If you have any questions feel free to contact me. – slhck – 2019-08-21T12:57:29.907

I actually didn't check video quality, it might be worse, I just checked speed and stopped the process afterwards. I'd upvote your answer and helpful commens, but I can't due to low reputation. Anyway, now I think now know everything relevant that could help my program, so thank you, especially for your time helping beginners. – SoptikHa – 2019-08-21T20:37:48.067

Answers

1

First of all, cutting a video with ffmpeg -i [input_video] -ss [seconds_to_start_cut] is quite slow. Instead, you could put the -ss option before -i, which means that ffmpeg will first seek to the cut point, and only then start encoding. This will still be accurate.

That said, a better solution that does not involve generating individual clips and concatenating them would be to use complex filter graphs. An example of this can be seen here. The filters allow you to trim the video and audio into segments, and apply a speedup/slowdown filter on these segments.

As you've shown, one instantiation of such a complex filter chain would be:

[0:v]trim=0:0.25,setpts=0.5*(PTS_STARTPTS)[v1];
[0:a]atrim=0:0.25,asetpts=PTS-STARTPTS,atempo=2[a1];
[0:v]trim=0.25:0.75,setpts=1*(PTS_STARTPTS)[v2];
[0:a]atrim=0.25:0.75,asetpts=PTS-STARTPTS,atempo=1[a3];
[0:v]trim=0.75:1,setpts=0.5*(PTS_STARTPTS)[v3];
[0:a]atrim=0.75:1,asetpts=PTS-STARTPTS,atempo=2[a3];
[v1][a1][v2][a2][v3][a3]concat=n=3:v=1:a=1

This speeds up the first and third segment by a factor of 2, and concatenates everything.

To make the encoding as fast as possible, you can use -c:v libx264 -preset faster (or even ultrafast instead of faster), see the H.264 encoding guide. The quality (and therefore the resulting file size) is controlled by the CRF parameter.

slhck

Posted 2019-08-18T15:27:41.490

Reputation: 182 472