1
1
I constantly transfer disk images and virtual machine images (usually 800GB to nearly 1 TB per file) to a cloud server via rclone using SSH, and I wonder how reliable are sha1sum and md5sum when it comes to verifying the integrity of very large files.
I found this: How can I verify that a 1TB file transferred correctly?
However it has something to do with performance rather than the reliability of the hashes generated.
Could there be a possibility that another file shares the same hashes generated considering there are so many distinct files out there?
So how reliable are MD5 and SHA-1 sums on very large files? Thanks.
I also found out this regarding collision: https://stackoverflow.com/questions/4032209/is-md5-still-good-enough-to-uniquely-identify-files
https://www.theregister.co.uk/2017/02/23/google_first_sha1_collision/
They are unless you are very unlucky or give into it a lot of effort (for SHA1). With MD5 the effort is significantly lower. If you are worried, go for SHA2 or SHA3 variations.
– Jakuje – 2017-03-05T15:58:04.2771
see also pigeonhole principle and birthday problem. for transfer verification purposes, either algorithm will work as a first step -- pigeonhole tells us a nonmatching sum is definitely not the same file, but does not prove that a matching sum is definitely the same.
– quixotic – 2017-03-05T16:06:55.500