0

We are running a search engine and are seeing a huge amount of fake search queries coming in over the last days from thousands of IP addresses. There is no real pattern in terms of queries text or IP range. It seems that there is a bot network trying to bring our down. Currently we're seeing about 30 fake queries per second and it's increasing.

We tried to set up cloudflare, but it did not really help. We could try to block out bad traffic with Captchas, but this could decrease usability for our real users.

Does anybody have an idea how we could handle this? We're running on AWS with Route53.

3 Answers3

1

This is a tough one, since they are essentially using a legitimate feature of your site.

You have a few basic options:

  • Do more work to try to identify the attacks and block them. One of the first things I've had to do in a case like this is make some tools so you can see what is happening and look for patterns. I was able to do this rather easily just using some awk scripts and doing counts on the various fields to look for common things. Do the requests have the same user agent? Maybe referrer? Maybe the search string length? Are all of the IPs coming from one country? Perhaps they have some slightly odd way they are using the URL, like appending a "?" - anything you can latch onto that would identify the traffic. This part is usually a cat and mouse game and is a matter of how much looking you can or are willing to do on your end vs how diligent the attackers are.

  • You could also disable that feature of your site and keep the rest up. I.e. it might be most practical to just replace your search engine with a static "temporarily unavailable, we're doing some changes" message, until the attack blows over.

  • You can also optimize the search engine so it operates better under load. Depending on what engine you are using - some search engines are rather inefficient (Drupal comes to mind). Properly optimized, you might be able to handle the traffic. Until you've run the numbers, don't underestimate how much CPU, memory, etc. is lost due to inefficiencies in the code.

If that search engine is core to your business and someone smart is really going after it, then you're pretty much down to optimizing it as much as possible and growing your infrastructure to handle the load.

bgp
  • 813
  • 2
  • 8
  • 12
0

You need to change your functionality in a way that will let you filter out the bad traffic, this will cause an "arms race" - but you are always in the position where you are changing and they are responding - and if you always prepare the next step in advance you can effectively neutralize their new version as soon as it becomes available.

If you keep this up they will probably either give up or change attack vectors soon.

For example:

  • Add an hidden field to the search form (just a constant value), reject request if that field is missing.

  • When they update their bot to include that field change it's value to an IP specific value (just the IP address will be fine)

  • When they update again, change the value to an hash of the IP address + a secret key

  • The next update add something that requires some trivial javascript (give the client two numbers and require the sum of those numbers, for example).

  • The next update get the javascript parameters from a cookie.

  • etc, etc, etc.

The point is that every time they update their attack you have a solution ready and you force them to update again, at some point they will, hopefully, just give up and look for an easier target.

Nir
  • 121
  • 6
  • 12
  • Thanks for your reply. The problem is that we have a lot of users, who are using their browsers's search box (via opensearch) and not the search form on our website. So we can't really add additional fields or parameters. – Erdinger2 Aug 29 '13 at 14:03
0

You could introduce Captchas only after a particular IP has made more than x requests in x timeframe.

Drew Khoury
  • 4,569
  • 8
  • 26
  • 28