2

I'm trying to understand more about MD5 and SHA1 hashes algorithms and their behavior in serious security software(found this but not helped me more).

Why does ClamAV use additional scanning techniques(MD5 for a specific section in a PE file, Wildcards, Icon signatures for PE files....) rather than just comparing MD5 of an existing malware?

Why isn't it just calculating the MD5 hash of the entire file? (As I know MD5 and SHA1 hashes functions are extremely accurate)

I'm not talking about heuristic detection, as this is can be done by hooking into the API call of the executable.

Marwen Trabelsi
  • 133
  • 1
  • 9

4 Answers4

4

Because malware can change parts of itself with random data and therefore change the file hash of itself or the files it has infected. Additionally, most advanced malware this days is packed with what is called a "FUD crypter", making the jobs of AV vendors much difficult.

Read more at http://www.hackpconline.com/2010/04/faq-what-is-fud-crypter.html

If you desire better security than what AV provides, adopt a whitelisting approach to executables, allowing to run only the .exe files you specifically allow. One such example is Windows AppLocker.

Matrix
  • 3,988
  • 14
  • 25
  • "Because malware can change parts of itself with random data and therefore change the file hash of itself or the files it has infected" I assume that you are talking about polimorphic malware, as i have already mentioned those type of malware can be detected by heuristic rules, but my problem is about a static code...my question in other words: MD5 hash of the entier file can be a good solution? – Marwen Trabelsi Dec 06 '12 at 00:07
  • 1
    Hash of what file? Malware? It's not a perfect solution, but when used with other detection techniques, it does complement them well. By itself, it will detect some, mostly simpler, malware. This is how the first AV worked. – Matrix Dec 06 '12 at 08:46
  • 1
    If it is a static file, then why MD5 the entire file as opposed to a minimal unique length of the file? There is processing cost associated with hashing and the goal is typically to minimize resource cost while maximizing detection. I think speed is the primary reason to avoid doing an MD5 of the full file if part will do. – AJ Henderson Dec 06 '12 at 14:18
  • @SmartyTwiti - We are not talking about polimorphic malware but most malware is generated and updated. When you are dealing with user files you better be darn sure, what you say is malicious, actually is malicious otherwise will only be left with angry users. – Ramhound Dec 07 '12 at 12:39
  • So guys, if i well understand, i must read binary data of the executable , and then Calculate signature of the unchanged part per/malware family, rather than calculate the entiere excecutable ? that would be more safe and rapid ? Thx in advance :) – Marwen Trabelsi Dec 07 '12 at 16:07
  • The irony in talking about security, is that link is now hijacked. Here's an archived copy. https://web.archive.org/web/20111031063532/http://www.hackpconline.com/2010/04/faq-what-is-fud-crypter.html – vr_driver Nov 17 '21 at 00:26
4

Virus will graft copies of themselves onto existing executable files. Thus there is not one archetypal file for a virus with all copies bit-to-bit identical to that file. Moreover, even a piece of malware is a stand-alone file, malware authors take care to automatically morph their code into zillions of variants (changing bits which are in the file but don't impact its functionality), precisely so as to make the detection task harder for anti-malware software. Old anti-malware worked just like you suggest (hash of the file, lookup in database of known "bad hash"), but malware authors have adapted long ago, forcing anti-malware to be smarter in its detection.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • Can you provide some links for further reading as to how antiviruses and malware detectors can detect viruses with altered hashes and the like? Thanks. – KeyC0de Dec 06 '17 at 20:54
2

The other answers are good, but another reason is that with hashes (not including piecewise hash systems like ssdeep), if the file changes in even the slightest, the hash will be drastically different. So the malware author can simply type:

echo 1 >> malware.exe

And the file will have a different hash.

Anorov
  • 654
  • 4
  • 8
  • Won't that specific example make the EXE not run? For demonstration purposes it would be nice to demonstrate a modification that still permitted execution of the EXE. Let me know if you come up with a way. – makerofthings7 Feb 24 '13 at 23:43
  • @makerofthings7 Since it only adds those bytes to the very end of the file, it should not in theory modify the execution of most executables. I tested it on 3 different Windows PE binaries, and all appeared to function same as usual after adding arbitrary data to the end with `echo` and `>>`. – Anorov Feb 25 '13 at 01:52
  • Good enough for me +1 – makerofthings7 Feb 25 '13 at 02:53
1

As has been mentioned hashing the entire file is ineffective for a few reasons.

  1. Its (slightly) computationally more intensive, so why do it if you can avoid it?

  2. Some viruses are polymorphic, hence any slight change can make them undetectable via this "entire file" method.

  3. It's actually relatively rare to get a malware executable individually. The vast majority of the time it will be bound with another executable, meaning again we go from one specific virus to thousands of varients depending on what file its bound to, again making it rather pointless.

  4. A final out that I'll outline is really the point of making it (at least a little more) difficult for malware authors. If I had a file with a heap of junk at the beginning, heap of junk at the end, but one central very specific malicious part in the centre which does the bad stuff and is hard to rewrite in a different way, I want to identify that in every file. Then no matter what junk they put before / after they still have to find a new way to rewrite this key section which may be a lot harder than morphing some junk.

Think of it this way, if I wanted to find a key phrase such as "Security StackExchange" on a webpage - should I search for that phrase specifically? Or find a page that contains it and just search for copies of that page? That's essentially the difference between searching for a specific virus signature (phrase) vs hashing the program (whole page).

It's worth noting a lot of these things have been grossly simplified for ease of explaination.

Peleus
  • 3,827
  • 2
  • 18
  • 20