3

I'm in a difficult situation, the Baidu spider is hitting my site causing about 3Gb a day worth of bandwidth. At the same time I do business in China so don't want to just block it.

Has anyone else been in a similar situation (any spider)? Did you come across a magical solution? Or did you just accept it and either block or not block the bot?

d.lanza38
  • 327
  • 1
  • 5
  • 13
  • If you use Nginx you could probably throttle requests by user agent. https://gist.github.com/supairish/2951524 – ceejayoz Nov 24 '15 at 21:09
  • 3
    See also http://webmasters.stackexchange.com/q/50558/17007 – Michael Hampton Nov 24 '15 at 21:15
  • @MichaelHampton if you post an answer I'll accept it. I personally can't use the apache mod but I'm going to try and go the route of Baidu webmaster tools. I'm just waiting on an SMS verification code (I hope it is just delayed and will come eventually). Post this link in your answer as well: http://www.webnots.com/tools/baidu-webmaster-tool/ (I found it useful given the language barrier) – d.lanza38 Nov 25 '15 at 15:31

1 Answers1

3

As long as it follows robots.txt you can throttle the requests

User-agent: *
Crawl-delay: 10
Mike
  • 21,910
  • 7
  • 55
  • 79