Generate a files list and truncate it in 2 list of X Mb of files each

0

I need to generate a list of all files in a certain dir, and truncate it in X lists containing each one a fixed amount of data of files.

E.g. I have 95 Gb of data.

  • generate the file list (total.txt)
  • split total.txt in 3 lists:
    1. slice1.txt containing the list of the 1st 35 Gb of files
    2. slice2.txt containing the list of the following 35 Gb of files
    3. slice3.txt containing the list of the remaining files

Any hint? I've googled and played around with find, awk, grep, but this task seems really above my competencies.

fradeve

Posted 2012-04-16T17:32:29.033

Reputation: 393

Answers

0

You can try using the split command to separate your files:

split total.txt -b 4444160

The above would split total.txt into 35 GB files.

nuclearpenguin

Posted 2012-04-16T17:32:29.033

Reputation: 161

total.txt is just a list of all files, like ls. Definitely wants split though. – Rob – 2012-04-16T19:09:16.917

0

while read filename; do cat $filename; done < total.txt | split -b 35G - slice

This will create "sliceaa", "sliceab", "sliceac" which you can rename.

With other implementation of split, you may have to say -b 35000m

If you have bash, you can write

cat $(< total.txt) | split -b 35G - slice

assume there are not hundreds or thousands of filenames.

To create total.txt

files=(*)
printf "%s\n" "${files[@]}" > total.txt

glenn jackman

Posted 2012-04-16T17:32:29.033

Reputation: 18 546

fradeve@edgar:~$ while read filename; do cat $filename; done < total.txt | split -b 35G - slice bash: total.txt: No such file or directory – fradeve – 2012-04-18T07:43:32.883

@fradeve, updated – glenn jackman – 2012-04-18T11:22:17.360

it works, but the file sliceaa contains the files themselves, not a list of them :) it's a bit-per-bit pasting of all files that are in one dir into a unique single file, not so useful if you want to generate a list of files :) – fradeve – 2012-04-20T13:15:30.487

1

I guess your question wasn't clear enough. Anyway, loop over the files in total.txt (while read filename; do ... done < total.txt). Use stat to get the file size and keep a running total size (bash manual) and a list of filenames. If the running size exceeds your limit, print the list of files to the current slice file.

– glenn jackman – 2012-04-20T13:36:22.783