How to split a tar file into smaller parts at file boundaries?

8

2

I have a tar file that I want to split into multiple smaller tar files. This would be easy with split, but I want the splitted files to be fully usable tar files themselves, which split can't do as it will split at arbitrary points, not at file boundaries.

So how to split a tar file into smaller parts at file boundaries, so that no file ends up being half in one tar and half in the other tar?

Solutions that don't use tar and accomplish the task by other means would be welcome as well.

PS: Yes, there will be cases where this isn't possible (tar with files larger then the split size).

Grumbel

Posted 2010-09-17T09:40:01.633

Reputation: 3 100

star has a promising option in tsize=, but I did not see anything like bsdtar ’s @archive that might complete the task. – Chris Johnsen – 2010-09-17T11:58:00.893

Answers

1

There is a tool, tarsplitter which safely splits tar archives. You specify the number of parts you want to split the archive into, and it will figure out where the file boundaries are.

https://github.com/AQUAOSOTech/tarsplitter

The output smaller archives won't be exactly the same size, but pretty close - assuming the files in the original archive don't have a lot of variation.

Example - split the archive "files.tar" into 4 smaller archives:

tarsplitter -p 4 -i files.tar -o /tmp/parts

Creating:

/tmp/parts0.tar
/tmp/parts1.tar
/tmp/parts2.tar
/tmp/parts3.tar

ruffrey

Posted 2010-09-17T09:40:01.633

Reputation: 126

3

If recreating the archive is an option this Bash script should do the trick (it's just a possible manner):

#!/bin/bash

if [ $# != 3 ] ; then
    echo -e "$0 in out max\n"
    echo -e "\tin:  input directory"
    echo -e "\tout: output directory"
    echo -e "\tmax: split size threshold in bytes"
    exit
fi

IN=$1 OUT=$2 MAX=$3 SEQ=0 TOT=0
find $IN -type f |
while read i ; do du -bs "$i" ; done |
sort -n |
while read SIZE NAME ; do
    if [ $TOT != 0 ] && [ $((TOT+SIZE)) -gt $MAX ] ; then
        SEQ=$((SEQ+1)) TOT=0
    fi
    TOT=$((TOT+SIZE))
    TAR=$OUT/$(printf '%08d' $SEQ).tar
    tar rf $TAR "$NAME"
done

It sorts (ascending order) all the files by size and starts creating the archives; it switches to another when the size exceeds the threshold.

NOTE: Make you sure that the output directory is empty.

USE AT YOUR OWN RISK

cYrus

Posted 2010-09-17T09:40:01.633

Reputation: 18 102

1

The tarsplitter command offered by @ruffrey looks like an awesome option.
I downloaded it, then did:

brew install golang

to be able to compile it. (Hmm...is it already in Homebrew? Nope.) The command successfully compiled on my Mac on 10.14. I'm currently making a copy of my gigantic archive to run tarsplitter against it. Two thumbs up for the recommendation.

I'm a relative noob when it comes to compiling other people's code, so it would have been helpful if the author made it clear it was written in GO instead of C/C++ and needed a new compiler installed. Also, make install doesn't work as there's no install in the Makefile, so I just did:

cp build/tarsplitter_mac /usr/local/bin/tarsplitter

Neat that the GO compiler built for Mac, Linux, and Windows.

ECJB

Posted 2010-09-17T09:40:01.633

Reputation: 21

1

I don't believe there are any existing tools to do this, but it would be reasonably easy to implement yourself. The tar format is pretty simple, so you'd just have to have a split that took it into consideration. The basic theory is to read a header, look at the stated length of the incoming file, and determine whether to split now or write out the current file. Read the next header, and repeat.

tylerl

Posted 2010-09-17T09:40:01.633

Reputation: 2 064

libarchive might be handy for something like this. – Chris Johnsen – 2010-09-17T11:36:54.250