For the SHA-1 hash collision part of your question, this has been addressed by a few of the answers.
However, a big portion of this hinges on the type of file we're working with:
Maintains the file's overall content and operation (but of course now includes malicious content that was not originally there changed contents)
What this means varies greatly on what is detecting the alterations:
- If it's a signed executable, not a (reasonable) chance: you'd have to get two hash collisions somehow: the SHA-1 of the file and the internal .exe signature.
- If it's an unsigned executable, .com, unsigned .dll, or similar, their resource forks can be added to in ways that will not change their operation and thus you could (eventually) get a hash collision that is not detectable by 'normal' operation.
- If it's a source code file or similar structure (.cs, .c, .h, .cpp, .rb, .yml, .config, .xml, .pl, .bat, .ini) the additions, modifications, or removals can be constrained to valid comment syntax such that the change would not be discernible by most uses (compiling or running it, not opening it up with a text editor).
- If it's an .iso or .zip or other container format, it is also more unlikely since most random changes will corrupt the container. It is possible to do: add a bogus file entry or alter a content within the container and recheck it, but you're adding a layer of complexity and adding additional time to check the result, as well as having limited degrees of freedom with respect to how and what contents may be changed.
- If it's a text or text-like format, they can be changed almost any way you like while still being a 'valid' file, though the content will probably be noticeable.
- With many formats like .rtf, .doc, .html, .xslx, and other markup-esque formats, they can be added or modified in ways that will be undetectable by parsers, so other than the length (or even with a constrained length, less freedom) the files can be altered to (eventually) get a hash collision while still being not only a valid file, but not noticeably changed in any way that would be visible to the typical applications they would be used with.
So, what you're left with is how to get collisions in whatever structure that is noncorrupting and some degree of undetectable perhaps:
- Make any functional changes you desire (perhaps insertion of malicious content) and make any additional changes to retain file format specific validity
- Add a section that will be non-functional (between comment blocks, at the very end of a text file with 3k carriage returns above it, isolate a current comment block)
- Add or select a character/code point/byte for modification and try every possible valid combination (not every byte combination is valid for different encodings, for example).
- Recompute the hash, see if collision matches.
- if it does not, goto 3.
Let's say you have a super fast computer and a smallish file, such that modification with a valid byte sequence and recomputing the hash takes 1 millisecond (probably requiring some dedicated hardware). If the hash distribution is perfectly random and distributed across the range, you will get a collision with SHA-1 every 2^160
attempts (brute forcing it).
2^160/1000/60/60/24/365.24
= 4.63x10^37 years
= 46,300,000,000,000,000,000,000,000,000,000,000,000 years
= 46 undecillion years.
But hey, let's try the 2^60
and 2^52
versions, and pretend that they allow us to modify the file any way we like (they don't) and that they, too, can be done in 1ms each try:
2^52 yields 142,714 years
/*humans might still be around to care, but not about these antiquated formats*/
2^60 yields 3.65x10^7 years = 36,500,000 years
/*machines will probably have taken over anyway*/
But hey, you might get lucky. Really, really, more-of-a-miracle-than-anything-people-call-miracles lucky.
7The answer is "it depends". If the ISO happened to contain lots of jpegs or movie files - along with your target executable, then it is possible. You can modify jpeg files quite dramatically without altering their size or visual appearance. Ultimately, the larger the file, the more you have to play with, and the better the chance of a non destructive collision. – Paul – 2015-03-15T22:44:07.523
How are you getting the hash value? From the download page, or some other way? – cpast – 2015-03-16T03:27:24.937
7@cpast exactly, many websites list SHA-1 hashes to allow you to verify your download. Thinking about it, it seems far more likely that a hacker would compromise a website by altering the content and the published hash. Then you're really screwed. – misha256 – 2015-03-16T04:05:54.937
1Just fyi, my question asks about SHA-1 specifically because it's quite common, especially with downloads from Microsoft/MSDN. Of course some websites publish MD5 hashes, others SHA256, etc. – misha256 – 2015-03-16T04:13:02.243
2The question is, why would you want to use a hash that has any known vulnerabilities, when there are alternatives that are just as fast, easy to use, and widely available that don't (eg. SHA-256)? Additionally, there's a reason cryptographers declare a hash insecure after only one vulnerability is found: history has shown that when one is found, others quickly follow. The famous Bruce Schneier quote is "Attacks always get better, they never get worse" – BlueRaja - Danny Pflughoeft – 2015-03-16T04:31:52.473
1
It's important to rememeber that if attacker has access to the server, he can replace both the file and the checksum.
– gronostaj – 2015-03-16T15:59:52.8333@misha256 Those sha1 hashes are for you to check for download corruption, not for security. If you want security then use gpg signed files – Daenyth – 2015-03-17T15:13:43.747