113

When entering your email address publicly, a practice is to replace . with text dot and @ with text at. I assume that the reasoning is that this way automatic email-collector robots won't match your address so easily. I still see updated websites using this.

However, this practice is not very hard to workaround with a program, and it has been around since more than a decade (as of 2013). Anyone in the business of collecting emails had quite enough time to update all their robots to handle this. Are there still robots that doesn't handle this? Why?

Are there any reasons remaining today to use this kind of mangling?

Adi
  • 43,808
  • 16
  • 135
  • 167
n611x007
  • 2,255
  • 3
  • 15
  • 17
  • 2
    Munging is necessary to write URLs in [YouTube comments](http://www.youtube.com/comment?lc=g7H4uECGKkeol204PcmLtpr6zC9zZK8F8nyiy75BuKY). "www.ninelivesrec.dev" becomes "vvv ninelivesrecs dev". – Iain Samuel McLean Elder Nov 06 '13 at 23:07
  • 1
    If your primary language is not English but for example Latvian, then `dot` in Latvian is `punkts`. I don't think that crawlers understood Latvian :) But this works only for small languages. Also if in email address contains numbers, you can write it: email_two_ at gmail `punkts` com (remove __ and two is 2) – Guntis Nov 12 '13 at 07:15
  • @Guntis Crawler are sophisticated robots! Basic translator know this! So using *`REGEX=(dot|punkt|point|pnt|точка|)`* is easy, for sample. Then robots could even adapt this by using language recognition againt thread... – F. Hauri - Give Up GitHub Jul 23 '19 at 08:18

4 Answers4

99

To understand this, we must understand how crawlers find the email. While steering away from the technicals, the basic idea is this (today's algorithms are, of course, smarter than that):

  1. Find @ in the page.
  2. Is there a dot within 255 characters after the @?
  3. Grab what's behind the @ until you reach a space or the beginning of the line.
  4. Grab the . and what's behind it until you reach the @.
  5. Grab what's after the . until your reach the end of the line or a space.

Now, an easy countermeasure would be to replace the @ with at and the . with dot. The most intuitive counter-countermeasure would be to teach the crawler that at is actually @. Well, it's not that simple. Take the following text:

We climbed into the attic and found a dotted piece of wood. Please email us: adnan at gmail dot com.

Now let's run our new crawler on it. First it will find the at in attic, then it will find the dot in dotted. The resulting email would be the@ticandfounda.ted, then it will find the second email adnan@gmail.com. Then spammers started teaching crawlers about finding certain domains, ignoring spaces, taking spaces into account, considering certain domain names, etc.

Then we started using images, spammers used OCR. We started using JavaScript tricks, inserting comments, URL-encide, etc. and always the spammers found a way to get around them. It's a race.

Having that said, the most basic techniques usually give good enough results (apparently, in some place in the world, that link is NSFW. Personally, I disagree), and the more obfuscate, the better results you get.

MB of spam by obfuscation method graph: CSS codedirection = 0, CSS display:none = 0, ROT13-Encryption by Christopher Burgdorfer = 0, using ATs and DOTs = 0.084, replacing @ and . with entities = 1.6, splitting e-mail with comments = 7.1, Urlencode = 7.9, plain text = 21

So, to directly answer your question: Is using 'dot' and 'at' in email addresses in public text still useful? Yes, I think so, at least to some degree. But this solution has been around long enough for us to assume that some crawlers have already found a way around it.

My advice? Either use some fancy advanced munger, or simply use images.

Adi
  • 43,808
  • 16
  • 135
  • 167
  • 1
    I can't imagine people who collect email addresses are unaware that addresses that are obfuscated are more likely to belong to people who are less likely to answer spam. – Random832 Nov 06 '13 at 18:32
  • 14
    People who harvest email addresses are selling them in bulk to spammers. Quantity means everything to profits. Quality is determined by a Boolean test: did an SMTP server not reject email to this address? There is no concern for "is this person likely to respond?" – John Deters Nov 06 '13 at 18:55
  • 38
    To protect your bike from thieves, you don’t need an unbreakable bike lock; you just need it to be stronger than most of the other cyclists’ locks. The graph you give shows that the same principle holds here. – PLL Nov 06 '13 at 21:03
  • @PLL Precisely! – Adi Nov 06 '13 at 21:05
  • That's a useful study. Someone should do a similar study, adding in images. – LarsH Nov 06 '13 at 22:39
  • This is a really excellent answer as always Adnan. I try not to make these comments, but this one deserved it. – David Houde Nov 07 '13 at 05:43
  • 2
    This answer is a real gem, however presented image (chart) is something that hurts me really bad. I can't imagine (believe), that using `at` instead of `@` and `dot` instead of `.` gives better results (less SPAM received) than using Javascript, HTML entities and Urlencode. There has to be some mistake here or I'm missing something / getting things wrong. As long as I've been reading about SPAM (past ten years) it was **always** said that methods given in question are the first that were adopted by spammers. How, then, can they be better than those more complex one like Javascript? – trejder Nov 07 '13 at 13:12
  • 2
    @trejder I completely understand your concerns. You see, `dot` and `at` become very problematic for the crawler precisely because they can be confused with real words of parts of real words. On the other hand, HTML entities turns the email into `adnan@gmail.com`. All it takes to crawl that one is to use `@` as `@` and `.` as `.`, very straightforward. URLencode isn't any better, `xyz%40example.com`. Again, simple `%40` and `.` matching. The only bit I don't understand is about constructing with JavaScript. – Adi Nov 07 '13 at 13:22
  • 2
    Ah, I see your point. OK, I agree with you on URLencode and entities. But even your example shows, that first e-mail, from "pure" text" will be incorrect, but second one, using only `dot` and `at` chopping, will be all fine. But, then again... How can this be better (a little bit, but still) than using Javascript, which (when well written) can make a real mess to crawler, nearly invisible to normal user? – trejder Nov 07 '13 at 14:28
  • 3
    @Adnan: Instead of searching for `@` can't they just search for `gmail` or other common domains? That's going to find anything even remotely like `firstname [dot] lastname [i think i'm so clever] gmail [ooh yeah] com`... – user541686 Aug 16 '14 at 10:41
  • @Therkel: at the time when my comment was made, the link was different than what it is today (it has since been moved to a webarchive). It's difficult to speculate on the state of the link 8 years ago, that's no longer in use in the current answer (nor even accessible without the webarchive), but it's possible it included NSFW ad content back then. All the same, I've deleted my comment since it clearly doesn't anymore (if it ever did). – Andrew Coonce Apr 27 '21 at 20:52
18

To my humble opinion, email obfuscation (of any sort) is one of the worse ideas ever invented.

The foremost concern for any user interface, web based or any other, is convenience and safety of its users. Spam bots are not users, thus they are not worth any consideration or effort.

The logic goes as following:

  1. E-mail obfuscation is a nuisance for legitimate users. Rather than simply clicking the mailto link, user will be forced to manually type in e-mail address into their mail address prompt.

    1.a. Even this by itself may deter the user from contacting the intended address - they will go elsewhere to simply avoid tedious interaction.

    1.b. The chance to enter erroneous but similar address in the process and thus send the possibly important mail to some typo-scamming mailbox is very high.

  2. Most legitimate e-mail addresses in existence are already known to spammers. Every mail box I've encountered to date (and this is a rather large number of mailboxes) was receiving some volume of spam on a regular basis. This is why all contemporary mail servers and clients come with spam filter integration, which, in most cases, is very efficient.

In short, just use plain and normal "mailto:" links and don't annoy your users unnecessarily.

oakad
  • 327
  • 1
  • 3
  • 3
    Oh, you can still put a normal mailto link behind the obfuscated fooÄTexampleDOTcom. Kind of defeats the purpose, but I have seen this so many times, that I must conclude a lot of the people using ATs and DOTs don't even know why (or have no understanding of crawlers at all). – linac Nov 07 '13 at 09:14
  • 3
    I didn't get any true spam at my personal addresses until just a few months ago. Probably some person or company leaked my emails then. It's certainly not true that every legitimate address gets spam. – user541686 Aug 16 '14 at 10:42
12

I have never understood the paradigm since its conception. We are simply depriving spam battling software the necessary data. As mentioned before, adding "at" "dot" to the parser is trivial too.

I would actually urge otherwise. Let the hell loose. Use your email and use any email for that matter. I even wrote a bot 10 years ago or so, where it produced infinite random emails page by page. If a crawler hit it, it would forever crawl non-existent emails.

We should not reduce the emails spam bots have to process. We should increase the number so in turn resource requirements, hence the cost of running a spammer would get higher and spam becomes less feasible economically.

We should take quality of spam filters into account when choosing a mail service so they get economical benefit while spam keeps hurting.

We have many instruments in place today which did not exist a decade ago. DKIM, SPF, reverse-PTR, blacklists and whatnot. Spam is getting less and less attractive. We should push it forward. Let it handle the load not ourselves.

Sedat Kapanoglu
  • 721
  • 3
  • 16
  • 1
    Interesting idea. And it doesn't have to be either/or; it could be both/and: obfuscate your real email address, and dump chaff on the spam bots. – LarsH Nov 06 '13 at 22:42
  • ....so you're the reason I get spam with multiple "To" addresses, Izkata / Izkaya / Izkaa / Izkaat....? – Izkata Nov 07 '13 at 03:57
  • 6
    Flawless logic! Let's all unlock our bikes together, this way it'll be such a burden on bike thieves to steal all those bikes, which will force them to stop stealing bikes. Now if we apply it on the Internet spam, I'm sure it'll work. I mean, let's look at other advertisement methods, the more the bigger the audience, the more they want to stop advertising. Just look at Google Ads, radio ads, TV commercials, physical spam mail, etc. – Adi Nov 07 '13 at 08:39
  • 3
    @Adnan if only one bike among a million is sold (hence the clicks to ads), it's something to consider for thieves yes. because they would be employing a million people just to sell one bike. – Sedat Kapanoglu Nov 07 '13 at 09:09
  • 6
    @adnan No, You're surrounding your bike with a bunch of two wheeled paper mache bikes and making your real bike look like a heap of spare parts. The bike thieves come and grab all the paper mache and leave yours alone! I think I like this idea. – TecBrat Nov 07 '13 at 14:07
  • @TecBrat The only part of the question that makes your analogy valid is the one about ssg's 10 year old bot that creates lots of emails, which is probably hosted on his site and that's it. I don't think this can be expanded applied to the real world. – Adi Nov 07 '13 at 14:14
  • How do you guarantee that your “random” email addresses are actually non-existent (unused)? – Scott - Слава Україні Dec 14 '13 at 01:18
  • 1
    @Scott you don't know that but neither do mail crawlers. the whole goal is to increase noise to make spamming more expensive. – Sedat Kapanoglu Dec 28 '13 at 16:01
  • @ssg You’re missing my point. If you randomly generate, say, naxa’s _real_ address, and publish it, then you will be responsible for naxa getting spam. – Scott - Слава Україні Dec 28 '13 at 21:07
  • @Scott that's only possible when spam can scale up easily to send spam to all random addresses AND naxa's email address. my argument is that they can't. it's neither probable nor feasible. – Sedat Kapanoglu Dec 29 '13 at 17:20
6

I rather doubt that it's ever been useful, and would expect email harvesters to have been scanning for that obfuscation even before people were naively using it; I certainly would have if I were in that game.

Our own tests have also shown that it doesn't take long before spam arrives on an email address that hasn't been disclosed on the web at all, with it likely being harvested from recipients address books and mail folders on a compromised machine; obfuscating an email address is in general at best only going to delay the inevitable and not actually prevent it.

Nick
  • 521
  • 3
  • 8