Command line merge a sequence of overlapping binary file chunks in Win or Linux

3

I have a set of binary file chunks from a video file. They are partly overlapping.

To exemplify say that the video file binary data can be represented like this:

---ABCDEFGHIJKLMNOPQRSTUVXYZ 

where

--- 

is a header.

The chunks can be represented like this (simplified because there are some differences in the header part for each chunk):

chunk 1: "---ABCD"
chunk 2: "---DEFG"
chunk 3: "---GHIJ"
chunk 4: "---JKLM"
...

I need a command line tool that merges these files. It should take the end part of chunk 1, search chunk 2 for that pattern, join chunk 1 to the part of chunk 2 from pattern to end (ignoring all data in chunk 2 before the pattern start).

Then repeat the operation for all remaining chunks until we have the complete video file.

adainr

Posted 2011-10-24T15:38:52.110

Reputation: 31

Answers

1

I found myself trying to achieve exactly the same goal several times when dealing with MPEG transport streams that were split into multiple pieces by the recording device.

The main problem is that, given two consecutive files, the overlapping area is never exactly identical in both of them, since some kind of header is always prepended to each file. So basically none of the existing merge tools worked for me.

In the end, I used a simple hex editor as @TrojanName suggested but soon found this manual process far too time-consuming and error-prone. Therefore, I decided to write a little tool called binmerge which does it automatically.

ph4nt0m

Posted 2011-10-24T15:38:52.110

Reputation: 111

If your tool does the job, this is the best answer so far. I cannot audit your code and I'm not going to compile and test the tool in the nearest future just to check if it works. I don't expect anybody to. Nevertheless the whole thing seems legit and I'd like to reward you for your attitude. Some of my correct (I think) answers to niche questions are not upvoted nor downvoted, maybe because nobody bothered to test them. I upvote this answer because you share your work. Other users be aware my single upvote doesn't mean the tool has been tested. – Kamil Maciorowski – 2017-07-06T19:57:31.277

0

I would just use a good binary editor and do it by hand.

TrojanName

Posted 2011-10-24T15:38:52.110

Reputation: 188

0

If you know the length of the header (---) and the length of each segment (A, B, C, etc) you could use head and tail commands. If such lengths vary from file to file then you are looking at a substring search problem (search the biggest substring in chunk 2 appearing in chunk 1). You might be able to automate it with awk or else with Python.

To get an answer to that, you might want to ask in stackoverflow. Nevertheless, if you only have one video stream you want to join, I agree with Brian Fenton.

obaqueiro

Posted 2011-10-24T15:38:52.110

Reputation: 421