29

When a new piece of malware appears, people can try to determine where it comes from, and who its authors could be.

How do security experts attempt to identify the authors of a new publicly disclosed piece of malware? What techniques (e.g. reverse engineering) are available?

Steve Dodier-Lazaro
  • 6,798
  • 29
  • 45
user3404735
  • 465
  • 5
  • 7
  • 13
    Decompile the malware and look for a personal e-mail address of course. Note, this will only work on 14-year olds trying to steal minecraft passwords. – Nathan Goings Jan 14 '16 at 17:21
  • 4
    We assume the maleware has no "about..." menu where you can commit bugreports or load the latest version along with legal contact data? – Zaibis Jan 15 '16 at 08:40
  • While the authors likely want to remain undetected, they are often narcistic enough to at least leave a pseudonym in the open for bragging purposes. – Hagen von Eitzen Jan 15 '16 at 15:44
  • If `...timestamps in the malware seem to indicate that the programmers worked overwhelmingly Monday-Friday in what would correspond to a 08:00-17:00 workday in an Eastern United States timezone` that is some good too. ([Kaspersky](https://securelist.com/blog/research/69203/inside-the-equationdrug-espionage-platform/)) – Eugene Ryabtsev Jan 15 '16 at 16:49

2 Answers2

49

There are a number of different techniques, depending on the skill level of the malware author:

  • Embedded metadata - compiled programs can contain details about their authors. This is most commonly seen in legitimate programs, and shows in the details screen if you look in Windows properties. Attackers who are out for fame might well put identifying details in these fields
  • Accidental embedding - compilers will often include details on compiler flags used, which may well include paths to source files. If the source file was in /users/evilbob/malware, you can make a pretty good guess that evil bob wrote it. There are ways to turn off these inclusions, but everyone makes mistakes sometimes
  • Common code - malware authors are like any other programmer, and will reuse useful bits of code from previous work. It is sometimes possible to spot that a section of compiled code matches a previously discovered section of code so closely that it seems probable that the same source code was used for each. If that is the case, can deduce that the second author had access to the code from the first, or may be the same person.
  • Common toolchain - if a developer tends to use Visual Studio, it would be unexpected to see their code turning up compiled with GCC. If they use a specific packer, it would be strange to see them using a different packer. It's not perfect, but it could suggest a distinction.
  • Common techniques - similar to the above, coders often have specific patterns of coding. People are unlikely to switch patterns, so you can make a reasonable guess that if some compiled code couldn't have been generated in a particular coding style, it probably wasn't written by someone who has previously been known to use a different style. This is much easier with interpreted languages, as seeing consistent use of, say, for loops rather than while loops is easier than spotting the differences between the compiled output of each (modern compilers may well reduce them to exactly the same set of instructions).
  • Malware origin - where did it come from? Does it have any text in specific languages, or typos which suggest a particular background? (e.g. colour would suggest that the author wasn't American, generale might suggest someone used to writing in a Romance language such as French or Italian)

None of these are on their own enough to determine an author, but combined, they might suggest a common author with previous malware, or even with other known code (e.g. from OS projects).

Matthew
  • 27,233
  • 7
  • 87
  • 101
  • 22
    [When coding style survives compilation: De-anonymizing programmers from executable binaries](https://freedom-to-tinker.com/blog/aylin/when-coding-style-survives-compilation-de-anonymizing-programmers-from-executable-binaries/) – lolesque Jan 14 '16 at 16:51
  • 3
    There was also that case about a Type I UUID used within the malware. Type I UUIDs have the MAC address of the computer and the time of generation... – Medinoc Jan 14 '16 at 19:30
  • 3
    Don't forget about when it phones home. The IP address of the command and control server can give clues too, and possibly traced. – Chloe Jan 14 '16 at 19:59
26

Matthew's answer was excellent. There are a few other ways as well.

  • Not a whole lot of malware authors are all that bright. For example, you can open a lot of executables in notepad and look for string data. I've seen countless authors who simply put their email address/server name, username, and passwords inside the programs in a string, and it literally shows up notepad.
  • Reverse-engineering malware made by authors who obfuscated the above step.
  • Finding the address which the malware connects to, and investigating anyone behind it. If it's a specific type of malware that infects a lot of machines, the developer is probably already known to begin with. If not, track it to the source. There are data trails everywhere.
Mark Buffalo
  • 22,498
  • 8
  • 74
  • 91
  • 2
    An example of two of your points: one bit of malware I had the (dis)pleasure of dealing with had the server WHOIS point the the authors house... – admalledd Jan 14 '16 at 19:55
  • 3
    @admalledd Now that's funny. I saw one point to the author's email address. I logged into his email and changed the password and contact information, and sent him an email saying he's being "watched." Never heard from him again. :( – Mark Buffalo Jan 14 '16 at 19:56