Here's my problem : I need to archive to tar files a lot ( up to 60 TB) of big files (usually 30 to 40 GB each). I would like to make checksums ( md5, sha1, whatever) of these files before archiving; however not reading every file twice (once for checksumming, twice for tar'ing) is more or less a necessity to achieve a very high archiving performance (LTO-4 wants 120 MB/s sustained, and the backup window is limited).
So I'd need some way to read a file, feeding a checksumming tool on one side, and building a tar to tape on the other side, something along :
tar cf - files | tee tarfile.tar | md5sum -
Except that I don't want the checksum of the whole archive (this sample shell code does just this) but a checksum for each individual file in the archive.
I've studied GNU tar, Pax, Star options. I've looked at the source from Archive::Tar. I see no obvious way to achieve this. It looks like I'll have to hand-build something in C or similar to achieve what I need. Perl/Python/etc simply won't cut it performance-wise, and the various tar programs miss the necessary "plugin architecture". Does anyone know of any existing solution to this before I start code-churning ?