5
2
I am working on creating multiple encoded streams from the single file input (.mp4). Input stream has no audio. Each encoded stream is created by cropping different part of the input and then encoded with the same bit-rate on 32 core system.
Here're the scenarios I am trying as explained in ffmpeg wiki for creating multiple outputs. https://trac.ffmpeg.org/wiki/Creating%20multiple%20outputs
Scenario1 (Using single ffmpeg instance)
ffmpeg -i input.mp4 \
-filter:v crop=iw/2:ih/2:0:0 -c:v libx264 -b:v 5M out_1.mp4 \
-filter:v crop=iw/2:ih/2:iw/2:0 -c:v libx264 -b:v 5M out_2.mp4 \
-filter:v crop=iw/2:ih/2:0:ih/2 -c:v libx264 -b:v 5M out_3.mp4
In this case, I am assuming that ffmpeg will be decoding the input only once and it will be supplied to all the crop filters. Please correct me if that is not right.
Scenario2 (Using multiple ffmpeg instances and hence three separate processes)
ffmpeg -i input.mp4 -filter:v crop=iw/2:ih/2:0:0 -c:v libx264 -b:v 5M out_1.mp4
ffmpeg -i input.mp4 -filter:v crop=iw/2:ih/2:iw/2:0 -c:v libx264 -b:v 5M out_2.mp4
ffmpeg -i input.mp4 -filter:v crop=iw/2:ih/2:0:ih/2 -c:v libx264 -b:v 5M out_3.mp4
In my case, I actually need to encode even more number of streams by cropping different sections of the input video. I am showing three here just to make this example simpler.
Now, in terms of fps performance I see that scenario 2 performs better. It also uses cpu to its maximum (more than 95% cpu utilization). Scenario 1 has less fps and cpu utilization is way lower (close to 65%). Also, in this case, as I increase the number of streams to be encoded the cpu utilization does not increase linearly. it almost becomes 1.5x when I go from one stream to two. But after that the increments are very low (probably 10% and even less with more streams).
So my question is: I want to use single instance ffmpeg because it avoids decoding multiple times and also, because the input I have could be as big as 4K or even bigger. What should I do to get better cpu utilization (> 90%) and hence better fps hopefully? also, why is the cpu utilization not increasing linearly with number of streams to be encoded? Why doesn't single instance ffmpeg perform as good as multiple instances? It seems to me that with single ffmpeg instance, all the encodes are not truly running in parallel.
Edit: Here's the simplest possible way I can reproduce and explain the issue in case things are not so clear. Keep in my mind, that this is just for experiment purposes to understand the issue.
Single Instance: ffmpeg -y -i input.mp4 -c:v libx264 -x264opts threads=1 -b:v 1M -f null - -c:v libx264 -x264opts threads=1 -b:v 1M -f null - -c:v libx264 -x264opts threads=1 -b:v 1M -f null -
Multiple Instances: ffmpeg -y -i input.mp4 -c:v libx264 -x264opts threads=1 -b:v 1M -f null - | ffmpeg -y -i input.mp4 -c:v libx264 -x264opts threads=1 -b:v 1M -f null - | ffmpeg -y -i input.mp4 -c:v libx264 -x264opts threads=1 -b:v 1M -f null -
Note that I am limiting x264 to single thread. In case of single instance, I would expect ffmpeg to generate 1 encoding thread for each x264 encode and execute them in parallel. But I see that only one cpu core is fully utilized which makes me believe that only one encode session is running at a time. On the other hand, with the case of multiple instances, I see that three cpu cores are fully utilized which i guess means that all the three encodes are running in parallel.
I really hope some experts can jump in and help with this.
btw, I have done extensive search on the above topic and none of the posts are really talking about why the single instance is not performing as good. the closest post I could find was this one (https://stackoverflow.com/questions/12465914/how-to-optimize-ffmpeg-w-x264-for-multiple-bitrate-output-files) but without the the kind of details I am looking for.
– shalin – 2017-06-13T19:39:43.213