Zip many files into several archives

13

2

Running Linux. I have a directory of around 150 large CSV files; simply doing a zip -9 on them results in a monolithic file that is still too large. I would like it to simply zip them in maybe four or five zip files of 30-40 CSVs each; this way sequencing or spanned zip order won't be a problem, as each zip is independent. There must be a simple way to do this. Any suggestions?

(and yes, zip is the preferred format, if possible)

WorldsEndless

Posted 2013-06-01T20:27:53.010

Reputation: 241

Answers

23

Isn't -s switch enough? You may use zip -s to split the file into files of maximum size, e.g.:

"zip -s 300m <2 gb file>" produces:

file.zip (300 mb, master file)
file.001.zip (300 mb)
file.002.zip (300 mb)
file.003.zip (300 mb)
file.004.zip (300 mb)
file.005.zip (300 mb)
file.006.zip (200 mb)

Then "unzip file.zip" will unzip everything together.

ranisalt

Posted 2013-06-01T20:27:53.010

Reputation: 565

What version of zip is this?? I get file.z01 file.z02 ... file.zip and unzip file.zip does not work directly (I would use zip -F to recombine them first). Note these are not "independent" as requested. – sourcejedi – 2013-06-02T10:22:49.577

1

@sourcejedi: In this answer (http://superuser.com/a/602736/195224) are some more detailed explanations.

– mpy – 2013-06-02T10:43:02.723

@mpy I know, I've just written that answer :). – sourcejedi – 2013-06-02T11:44:42.443

@sourcejedi: Oh yes, now you say it... ;) – mpy – 2013-06-02T11:52:46.777

2

Use split on the list of input files :-).

(Not tested, I've included rm commands for cleanup, take care).

ls *.csv > csvfiles
split -d -l30 - csvfiles < csvfiles
for i in csvfiles[0-9][0-9]; do
  zip "$i.zip" -@ < "$i"
done

rm csvfiles
rm csvfiles[0-9][0-9]

sourcejedi

Posted 2013-06-01T20:27:53.010

Reputation: 2 292

Why do you use split -C (--line-bytes) and not split -l (--lines)? That would be more predictable, with regard to how many CSV files are in one archive. – mpy – 2013-06-02T12:00:39.887

I skimmed the manpage too quickly. Thanks, I'll fix it! – sourcejedi – 2013-06-02T12:58:37.000