It has been known since 2004 that the MD5 hash is vulnerable to collision attacks (update - not "preimage" attacks - my mistake....). Yet it still seems that people are using it to identify malware. E.g. reports about the new Flame malware document people going back several years to discover the same md5 signatures in archived md5 data.
How old is Flame? - Alienvault Labs
An attacker could presumably ensure that all their files matched the md5 hash of other files which they make public and which seem innocuous, so relying on md5 seems dangerous.
I don't see references to sha256 or even sha1, which have not seen (public) collision attacks. What is the status of moving to better hashes for virus databases?
Update: the concern I had was that if the virus db didn't also retain full copies of all the files in question (eg because some were really big or whatever), and/or if folks searching the db didn't check the full contents of the new files they're looking up with the archived files, then a new file from a malicious virus, which matched an old "innocuous" file might be mistakenly dismissed as not dangerous just based on an md5 match. But hopefully the full files are retained and checked by anti-virus researchers, or else they would be vulnerable to this attack.
So what sorts of attacks against malware ids might make use of the ease of producing md5 collisions, and what steps are actually taken in specific hash databases and AV software to thwart them?