3

Given a base directory (like /home/user) is there a command that could be run which would create an archive back up all text files (i.e. files less than 100kb) in that directory and it's children? I know you can tar/gz a directory - but what about excluding files also?

The idea is that most photos, videos, and other large files would be ignored while all important hand-typed documents could easily be backed up quickly when moving around projects and servers.

UPDATE

Using skinp's awesome code I was able to backup a small amount of the files. However, as DerfK pointed out - there is a limit to the size of arguments you can pass to commands. With that in mind I was able to write out the contents of find hoping that I could use something to read the contents to tar and bypass this limit. The other other option seems to be a sh script that could ad each file to the archive at a time.

find /home/username -type f -size -100k > list.of.small.files.txt

Which rendered out a 6MB file.

Xeoncross
  • 4,269
  • 12
  • 42
  • 55

3 Answers3

5

Looking at the other answers posted here so far, I'm concerned that the uses I see of xargs and find -exec {} are erroneous. If and when the file list grows long enough that tar -c is executed by xargs more than once, the tar file populated to that point will be overwritten. Thus, only the files from the last invokation of tar will end up in the tarfile.

Here's a one-liner that should always work, regardless of the total number of files, and regardless of whether filenames contain embedded newlines:

find /home/user -type f -size -100k -print0 | tar -c -z --null --files-from=- -f backup.tgz

The find command generates a list of null-terminated file names to backup, and the tar command reads that list from the pipe and creates the tarfile backup.tgz.

Steven Monday
  • 13,019
  • 4
  • 35
  • 45
3

I would use the power of find:

find /home/user -type f -size -100k -exec tar cvzf backup.tgz {} \+

type: specifies you want a file not a directory
size: with the number preceeded by -, means we want less than 100k
exec: execute the tar with {} being the files found, \+ means end of the exec

You could also use xargs:

find /home/user -type f -size -100k | xargs tar cvzf backup.tgz

Update:

tar as a command to append a file to an existing tar archive (maybe even not existing, it works for me).

Here's a simple example script doing this:

find . -type f -size -100k -print > filelist
for i in `cat filelist`
do
    tar --append $i --file=backup.tar
done
gzip backup.tar

Obviously, this script is highly ineficient... It only append one file at a time, launching the tar command as many time as there is files.
It would be good to script it in a way that it append say 1000 files each pass...

skinp
  • 749
  • 1
  • 7
  • 19
  • You will have a lot of `tar`s :-) – Déjà vu Oct 19 '10 at 18:25
  • 1
    From what I remember, \+ means execute the command once for all that was found, \; means execute the command each time something is found. I tried it and it worked... could be system dependent I guess though. – skinp Oct 19 '10 at 18:31
  • Both work, (using lain's modifications) so which would be better/faster on large directories? – Xeoncross Oct 19 '10 at 18:42
  • You are right actually. Didn't notice the `+` – Déjà vu Oct 19 '10 at 18:43
  • 1
    @Xeoncross both options are essentially equivalent, but the `xargs` option requires starting a third program (xargs) so could be a few ms slower. Also, by "large" do you mean *really large*? Be careful, if you have more than 131k or so files you'll hit the ARG_MAX limit on commandline arguments with either the xargs or the find command and you'll need a different option for getting them all in the tarball. See http://www.in-ulm.de/~mascheck/various/argmax/#results – DerfK Oct 19 '10 at 19:01
  • @DerfK Apparently I am hitting the limit because the tgz file is stopping at about 9MB of files (about 3% of the sub 100kb files I was backing up). Perhaps what I need is a sh script that can add files one at a time to avoid the arg limit. Or I could save the output of find to a text file and read that to tar. – Xeoncross Oct 19 '10 at 19:12
  • 1
    @Xeoncross have you used `tar tzf backup.tgz` (possibly add `| wc` to get a line count rather than looking file by file) to confirm that you didn't get all the files? Text tends to compress really well. You could also run `find /home/user -type f -size -100k | less` and look through it to see if you're finding the files you thought you'd get (or `wc` instead of `less` to compare the line count to the tar command above) – DerfK Oct 19 '10 at 19:58
  • @skinp: `xargs` can bunch arguments and call the command (`tar --append` in this case) every `-n` files. – Javier Oct 19 '10 at 20:33
  • Warning! If you want to use `xargs` or `find -exec {}` with `tar`, you really should be using `tar -A`, and NOT `tar -c`. See my answer for an explanation why. – Steven Monday Oct 20 '10 at 01:10
3

Try

find /home/user -type f -size -100k -print0 | xargs -0 tar cvzf tarfile.tar.gz

which will be safe for files with spaces in the name too.

user9517
  • 114,104
  • 20
  • 206
  • 289
  • Danger! This will fail if/when the list of files grows longer than the maximum number of command-line arguments. `xargs` executes the given command more than once if it can't fit all the arguments into one command invokation. In this case, each time `xargs` runs another copy of `tar -c`, the tarfile will be overwritten, and the final result will contain only the last batch of filenames that `xargs` received. You really need to do a `tar -A` instead. – Steven Monday Oct 20 '10 at 01:04