SPAM Email Analysis

Question

What are the steps you would follow in order to identify in an email is a SPAM/SCAM/Phishing attempt? The reason I am asking this question is that sometimes very well crafted junk emails manage to bypass the automated AntiSpam tools, so further investigation is required.

Usually I review the sender IP reputation using the MXToolbox Blacklist Check. I also use VirusTotal to scan attachments / URLs for malware. I was wondering if anyone else has any other steps or online tools to help in this space.

score 4 · Answer 1 · answered Dec 24 '13 at 15:41

As a rule of thumb, if it looks like spam then it is spam. Automated tools can be defeated, but a human brain (mine, in this case) is harder. Antispam software is useful to weed out most spams automatically, reducing the problem to a size where human filtering is tolerable. It is in fact relatively easy to identify dangerous spam because in order to be dangerous, the spam must have some hook: a clickable link to a funky-looking domain, an attached executable file (or Zip archive containing an executable)... The ambiguous spams are thus the spams (or non-spams) which are mostly harmless.

For instance I once received a spam whose complete body was pure text (no HTML version, no attached file, just ASCII text) and consisted of a single word: "Theravada". This is the name of a branch of Buddhism so the only potential effect of that spam might have been to help me reach enlightenment, which I don't categorize as dangerous.

The problem, here, is that human filtering of emails requires a brain who is well aware of how, technically, a given email can bring harm to a machine, so this works only for me or any other InfoSec specialist, not for generic users. (Also, don't read your emails while drunk.) This also emphasizes the point that antispam filters are never absolute, especially since they rely on heuristics to which spammers continuously adapt; at best, good rules will help reducing the amount of spams to human-manageable totals.

In a generic site, a workable compromise may be the following:

Use "normal" tools like SpamAssassin to block the 90% of "obvious spams".
Use whitelists to automatically deliver emails coming from known good sources. This raises the question of whether a "known good source" could be impersonated; DKIM can help (i.e. if an email is guaranteed, through DKIM, to come from a given server and that server has been whitelisted, then let the email go through).
Let pass emails which are "obviously" harmless. This is relative to how well the users are trained; a pure-ASCII email can still ask for the user to send his password to sysadmin@evilhackerz.com, and if your users will fall for it then about no email can be deemed "obviously harmless".
All the remaining emails can be accumulated in a quarantine zone, to be inspected by knowledgeable humans.

Human inspection of emails can have legal ramifications with regards to the expectation of privacy of communications; even in a business context you still have to make sure that all relevant policies and contractual clauses are "legally clean".

There are many companies out there who make a pretty penny selling and/or operating antispam systems, so we might infer that utterly defeating spam is probably not an easy thing to do.

I wouldn't promote the use of DKIM since I received spam with DKIM headers. The sending users were phishing victims. And they sent a lot. DKIM doesn't seem to have a real impact on users intelligence within the domain. — dan, Dec 24 '13 at 17:05
DKIM does in no way guarantee the _quality_ of the email, only its _provenance_. You still have to make a decision about what source site is "inherently clean". Big sites with many users cannot be deemed inherently clean, of course. — Tom Leek, Dec 24 '13 at 22:00

score 3 · Answer 2 · answered Dec 24 '13 at 15:50

MXToolbox and Virustotal are very good tools, in my opinion. Bear in mind that a compromised email account sending at a "low and slow" rate may avoid blacklisting for some time. Virustotal can still miss zero-day malware, but I agree that there is probably nothing better.

However, human detection of unwanted or malicious emails is easy (assuming you have a clever human!). Remember that the use of tools is to aid human assessment of emails - if a human thinks an email is spam, but an app thinks it's legitimate, the human assessment should prevail.

Spam is usually immediately obvious because it's selling something, and is the catch-all category for unwanted email that has no attachments or hyperlinks. Furthermore, spam is very much in the eye of the beholder - many users will opt-in to a newsletter from a legitimate online store, yet regard the mailouts as "spam".
Phishing is easy to define: the sender is pretending to be someone else and providing a "login here to do something" link in the email. If unsure, contact the (purported) sender by an alternate method and check! You can get false positives if the sender is using a marketing service, although most legitimate providers will have informative headers + opt out in each email.

Scams tend to follow established patterns, and there are actually only a few markers to remember:

You're asked to send money (Lottery scam / I've been robbed overseas / Help me with this transfer and get a fee / etc.). This is by far the most common type of scam.
You're asked to reveal personal information like names, addresses, usernames, and passwords (Facilitates spear phishing or identity theft)

score 2 · Answer 3 · answered Dec 24 '13 at 21:52

The best available information is mostly outside the email itself:

The email is sent from a server which is not a legitimate server for the sender of the email, or it is sent from a machine which appears to be on a dynamic network like a wifi hotspot or dialup.
The email is sent from a server which has been the source of spam in the past.
The email is sent by a server that isn't particularly compliant to the relevant SMTP RFCs. Graylisting exploits this by returning temporary failures and expecting real mailservers to queue and redeliver. Another technique is to insert a delay in the initial handshaking to detect sources which were sending a script of SMTP commands without actually interacting with the server.
The email is sent to the wrong server for the recipient. Secondary MX records often point to machines that simply queue email and lack the information necessary to classify spam or bad recipients. Spam senders take advantage of this by avoiding the (well configured) primary MX record, even when it is available.
The email is sent to a lot of recipients. Big email services like Google have an advantage in spam detection because they can see when a message is sent to many people who do not otherwise tend to receive the same message.
The email has URLs which reference sites that host malware or phishing scams.

SPAM Email Analysis

3 Answers3