1

Recently I put some hidden links in a web site in order to trap web crawlers. (Used CSS visibility hidden style in order to avoid human users accessing it).

Any way, I found that there were plenty of HTTP requests with a reference of browsers which have accessed the hidden links.

E.g : "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"

So now my problems are:

(1) Are these web crawlers? or else what can be?

(2) Are they malicious?

(3) Is there a way to profile their behaviour?

I searched on the web but couldn't find any valuable information. Can you please provide me some resources, or any help would be appreciated.

TestRunner
  • 113
  • 4
  • 4
    direct those links to captcha authentication. That way you can find whether they are web-crawlers or bots simply. – Maximin Dec 30 '13 at 10:54
  • It's trivial to copy a legitimate User-Agent string and program a bot to use it. If these links really were hidden, you can be reasonably sure that they aren't human visitors. – Michael Hampton Dec 31 '13 at 00:31

2 Answers2

3

All websites should be receiving regular attacks, if you are not begin attacked, please check your internet connection.

A link is in no way shape or form "hidden" if a robot can find them. A malicious bot will use a robots.txt file to enumerate resources on your system ignoring the deny directive. Using a CAPTCHA will prevent automated attacks. Also consider password protecting "hidden" links.

rook
  • 46,916
  • 10
  • 92
  • 181
  • Thanks for your comment, I just wanted to find whether these requests are malicious or not? – TestRunner Dec 31 '13 at 04:06
  • @TestRunner i don't think you have provided enough information to assess that. But you are probably being attacked right now, like everyone else. – rook Dec 31 '13 at 16:38
3
  1. They may well be crawling your website. They are almost certainly making automated requests and not rendering the content of your pages in the way that the usual web browsers would. A web client can report whatever User-Agent they like and may be trying to hide the fact that they're automated when they report the User-Agent commonly associated with web browsers (such as the one you posted which looks like Chrome running on 64bit Windows 7).
  2. Probably not. Making requests of a web server is not malicious activity unless those requests contain some kind of exploit. Making requests to gather information about the website is not malicious even if the intent is to use the information for a malicious purpose.
  3. By grouping entries in your web logs by IP address and User-Agent, you may learn:-

    • whether a client is crawling your site
    • the pattern of crawl (breadth-first, depth-first or random)
    • whether a client is attempting to fingerprint your website (such as making lots of requests for resources which don't exist or making requests for resources which are known to exist in web applications which have a vulnerability)

    Do a search for "HoneySpam 2.0" for some more information on learning about the behaviour of clients.

jah
  • 390
  • 2
  • 10