How do I create a tar file in alphabetical order?

22

4

I want to create a tar file where all of the directories and files are processed in alphabetical order. This is for the entire directory hierarchy that's being tarred up, so it would start by processing the first directory alphabetically, and then sub-directories in there alphabetically, etc. I looked through the man page and can't find a switch for this.

I will admit, this is half novelty, half slight optimization. I just can't believe that there isn't an easy way to do this. I must be missing something.

Erick Robertson

Posted 2010-08-05T13:54:04.743

Reputation: 658

2

@matthiaskrull I have unrelated reason for this, I am creating a OVA file (which is a tar file) for deploying VMs on VMWare ESX Server. The OVA needs files in a specific order inside it (The first file should be an OVF and so on ).

– xask – 2014-09-16T11:29:39.780

@xask That is indeed a valid reason. And there is always the --append option for a specifically required order that can not easily be achived by sorting. – matthias krull – 2014-09-29T13:24:54.773

1There is also a very good reason for this: performance on a very large file when you want to extract only a portion of it. Since its order is by default random, and you want to extract a file/directory, if it's ordered it will be faster, if it's not, it will need to scan the whole archive prior it knows it has finished. – StormByte – 2015-03-13T20:33:33.993

2Why do you want to do this? – matthias krull – 2010-08-05T15:02:54.697

Mostly, it's because I want to know how close the tar operation is to being completed. When the files are being loaded in random order, there's no way to tell with the -v flag. – Erick Robertson – 2010-08-05T16:01:05.080

2That's not entirely true; If you pipe the output to a file and know the number of files (say a quick find command), you can compare the -v output (wc -l) with the number of files from find to get a sense of progress... – Slartibartfast – 2010-08-06T01:45:23.280

Answers

12

Slartibartfast is on the right track, but tar's default behaviour is to descend into directories, so you may get more than one copy of the same file included in the generated tar file. You can check by doing tar tf file.tar | sort The workaround is to include the --no-recursion option to tar. Also, you should be able to send in strange filenames by using the -print0 option to find, then using --null option to tar. The end result looks like this:

find paths -print0 | sort -z | tar cf tarfile.tar --no-recursion --null -T -

You can check the order in the tar file by using tar tsf tarfile.tar. Although you'll probably never need the -print0, -z, and --null options unless you know you're going to encounter a filename with a newline embedded in it, I've never tried it.

Charlie Herron

Posted 2010-08-05T13:54:04.743

Reputation: 121

Excellent suggestion for using the --no-recursion option, thanks. – Erik – 2012-10-05T07:20:29.370

This is the solution that worked for me. I have a different use case than Erick and Google brought me here. I am collecting snapshots over time of the complete state of a remote system. The data is highly redundant. Sorting the tar input by time (filenames have a timestamp) improves the compressor's performance. A quick test shows an improvement by factor 2 (lzma2). Also, I do not unpack the archive into a filesystem, but do a stream processing over tar entries. A sorted stream makes a lot nicer debug output and has other benefits in the process chain. +1 – Johannes – 2014-02-08T13:27:10.120

5

The order of the files within the tar file does not really matter, since when the files are extracted, the filesystem will not preserve the order anyway.

There is no switch for this, but if you really wanted it, you could provide tar with a list of filenames in sorted order, and it would create the tar file with the order you give it.

% tar cf tarfile tmp/diff.txt src/hellow.c junkimage.IMG barry/thegroup
% tar tf tarfile
tmp/diff.txt
src/hellow.c
junkimage.IMG
barry/thegroup

Kevin Panko

Posted 2010-08-05T13:54:04.743

Reputation: 6 339

2or just sort the output: tar tf tarfile | sort – Doug Harris – 2010-08-05T14:53:20.643

I have way too many files (20,000+) to specify them all on the command line. – Erick Robertson – 2010-08-05T16:00:18.683

Depends on the file system. – Thorbjørn Ravn Andersen – 2017-04-19T11:44:28.257

4The order of the files within the tar file does matter if you need to decompress and display while downloading. – Erik – 2012-10-05T07:12:10.580

4

Assuming you don't have any files with newlines in the names:

find /source_directory -print | sort | tar -czf target.tgz -T -

If that doesn't work (never tried it, so I don't know of - means stdin for the -T argument):

find /source_directory -print | sort > /tmp/temporary_file_list
tar -czf target.tgz -T /tmp/temporary_file_list

Then there is the question of why. But sometimes it is easier not to ask.

Slartibartfast

Posted 2010-08-05T13:54:04.743

Reputation: 6 899

2

find . -depth -print0 | sort -z | pax -wvd0 > file.tar

Pax is sort of the POSIX successor to cpio and tar and kind of fuses the best aspects of both. It writes tar archives (ustar) by default. It also does automatic spanning and prompting for media and prints a summary when it's done.

Thomas Crescenzi

Posted 2010-08-05T13:54:04.743

Reputation: 21

0

As an alternative to @CharlieHerron's answer, if you are only interested in preserving content (files, symlink) and folder meta-data (e.g, folder permission, mtime, etc.), you may want to filter folders out of find's output.

find paths -not -type d -print 0 | sort -z | tar cf tarfile.tar --null -T -

user1202136

Posted 2010-08-05T13:54:04.743

Reputation: 249