When I'm compressing GBs of data with 10k plus of files and directories, how do I know there's no corruption?

1

To limit the focus of the question, I'm going to say using Linux as the OS, and tar as the compression format. Clearly run-time is important, but knowing the archive is valid is more important. If providing a command-line, please also link to the documentation for it, and within your answer explain each segment of the command.

blunders

Posted 2011-03-10T19:45:51.060

Reputation: 759

3Just a small comment, but I'm pretty sure that tar does not really compress the files by default (I don't think) ... it just balls them all together. I know that some versions have built-in GZIP compression, but by default they usually just build an archive file. – Marc Reside – 2011-03-10T20:03:25.463

+1 @Marc Reside: That's in fact a great point, I'd noticed "tar.gzip" before, but never knew tar just forced directories/files into one file. – blunders – 2011-03-10T20:09:52.427

I've been recently playing with the TAR and GZIP file formats in order to better understand them. I'll try to give you an answer for your question in a bit ... I'm making sure I have my facts straight. :) EDIT Nevermind, it looks like M'vy gave a good answer. – Marc Reside – 2011-03-10T20:14:23.583

Answers

1

Aha! Found what we're looking for (Google saves the day)!

Check this link out: http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/

Like you, the author is looking for a good way to verify the TAR archive. He proposes a method to do just what you're looking for, and points to a piece of script software that does the checking for you.

It's not perfect, but it's better than nothing.

EDIT: It seems VeriTAR even supports compressed TAR archives.

Marc Reside

Posted 2011-03-10T19:45:51.060

Reputation: 1 484

1+1 @Marc Reside: Yes, I'd had been using a checksum method, but it had a super amount of overhead, I'll take a look VeriTAR in the next few days and get back on the results. Thank YOU!! – blunders – 2011-03-10T20:55:14.590

2

Would the --verify flag be usefull to you ? GNU

M'vy

Posted 2011-03-10T19:45:51.060

Reputation: 3 540

It's better than nothing, but still seem "flake" -- that said, if no better answer is provide I'll take it as the answer, since I was not aware of the --verify flag and it's of use. Thanks! – blunders – 2011-03-10T20:07:55.377

Yeah no pb. I know that this answer is not ideal, but that's all I got ^^ – M'vy – 2011-03-10T20:14:41.380

I think any better answer would require a bit of scripting to add a file or data block checksum to the process somehow. It doesn't look like the TAR archive program has any checksum verification for the data blocks (just the header blocks). Still looking, though. – Marc Reside – 2011-03-10T20:28:20.060