3

I own a url shortening service. I want to deliver only legitimate statistics to my clients. There are possible scenarios that a particular user writes a script to automatically open the shortened URL, thus making the statistics look bad. What are the approaches one can follow to detect if a click is legitimate or not? The very basic approach that I can think of is to monitor the IP address of the user and block if the number of requests exceed a threshold.

Rory Alsop
  • 61,367
  • 12
  • 115
  • 320
Ishan Jain
  • 31
  • 1
  • 2
  • 1
    Are you trying to detect an abnormal number of clicks on a single link or an abnormal number of clicks from the same user over many links? They're two very different propositions, the first being a lot more difficult to detect accurately if the "attacker" is sophisticated. – Ben Feb 07 '14 at 13:15
  • I want to detect abnormal number of clicks on a single link. – Ishan Jain Feb 07 '14 at 14:06

4 Answers4

5

There's a number of potential methods you can use to differentiate bots from humans but none of them are likely 100%

Obviously as you say rate limiting catches the really stupid bots who don't know to click at human speed. You could say one click per IP but that will artificially deflate your stats in the case of humans behind a proxy (becoming more common as IPv4 addresses run out)

IP blocking isn't too useful in the days of cloud computing, it's pretty easy for an attacker to get a different IP or range of IPs to work from if they're dedicated enough.

As @fas says you can try user agent, but again that'll only catch bots who don't know how to set a user-agent which isn't really too difficult to do.

You could introduce some "computer hard" task into the clicking process but that would make your site pretty user unfriendly (e.g. CAPTCHA's). Again not 100% but harder to overcome trivially.

Ultimately I'd suggest it depends on how motivated/well funded your attackers are. If they are reasonably motivated and have some cash they can just hire real people to click the link (e.g. via amazon mechanical turk), at which point you're likely going to find it tricky to differentiate legitimate traffic from non-legitimate traffic.

Assuming your attackers are more casual about it, I'd say combine User Agent and source IP address. User Agents can actually be relatively identifying (more info on the panopticlick site ). So if you limit each user agent to one click per source IP address you may get a decent approximation, against an relatively unsophisticated attacker.

Rory McCune
  • 60,923
  • 14
  • 136
  • 217
  • I was going to answer something similar. The only way is detecting patterns in the bots and delete them from the statistics. +1 here. – kiBytes Feb 07 '14 at 09:02
  • Can I apply any machine learning algorithm in my case? I searched on the web but the content is specifically for detecting crawlers. What can be the features that I can use if I want to apply a ML algorithm? – Ishan Jain Feb 07 '14 at 13:20
1

Owen's answer above reminded me of something very low tech but simple and effective that I've tried. To get an idea of how badly bots are skewing my click stats, -- right next to my affiliate banner graphic -- I have a single-pixel, transparent gif that's also wrapped in an href tag. I think it's safe to assume that only bots would be clicking that invisible link so I just compare the number of clicks for the affiliate banner vs. the transparent, 1-pixel gif link.

Joeinfo
  • 11
  • 1
0

Try checking for valid user agents and referers. User Agents can always be spoofed but that's your best bet. Even if a bot is clicking through though, I'd consider that traffic.

d1str0
  • 2,348
  • 14
  • 24
0

A little method I've just come up with thinking about this (and it's probably been done before):

When a link is clicked with a mouse, it generates an event, and that even has a property clientX and another clientY which return the position of the pointing device when the link was clicked.

If a link is clicked using an ECMAScript trigger of the click event, the clientX and clientY properties are both not set.

If you interrupt a click event and check whether clientX or clientY are set, you can start to detect remotely fired events. Of course, this has the downside that you effectively disable triggered click events that your application may already use legitimately, and to resolve that you may need to create some sort of method of distinguishing between 'valid' and 'invalid' clicks - a kind of click authentication engine.

This is a very basic idea, and I haven't honed it down or tested it well, but maybe it will be of use to you. I've created a rough CodePen demo highlighting the functionality, with a simple version of the code that it might run.

I haven't checked it on a touch device, and you may well find that Touch events do not return these properties either, meaning you may need to find an alternative property to check for on touch devices, if this were to work effectively.

In terms of blue sky thinking, it's not too bad though, and it may help you form a more cogent approach to avoiding clickjacking.

http://codepen.io/seajones/pen/vminB

Owen
  • 1,066
  • 5
  • 9
  • 1
    The problem is that my service is only for url shortening. What happens after the link is opened or before the link is opened is not under my control. – Ishan Jain Feb 07 '14 at 13:22
  • Ah well, in that case it's more one for the server admin bods. Sorry I can't help further – Owen Feb 07 '14 at 13:55