3

We have ModSecurity setup to log to modsec_audit.log for Apache2. Today we have had 2259 entries created in this log with a referrer of:

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Do these entries mean that bingbot has been stopped from crawling our website? The H record for this log entry states (there is no reference to severity or tag:

Message: Rule execution error - PCRE limits exceeded (-8): (null).

The HTTP result code for this entry is 200 OK.

I am trying to get an understanding of what the records in these logs means so I can create some form of reporting. For example I am of the understanding that if the H section of the log entry states:

[severity "CRITICAL"]

That ModSecurity has blocked the page request. Am I correct in this understanding?

Hope someone can help clear this up for me. :)

masegaloeh
  • 17,978
  • 9
  • 56
  • 104
Linnay
  • 33
  • 1
  • 5

1 Answers1

1

The first thing to check is whether it was actually the Bing Bot or not. It's trivially easy to spoof your User-Agent header and all sorts of malicious bots do it all the time. True Bing Bots always come from an IP address the has a reverse lookup to <something>.search.msn.com. You should then check the forward lookup of the returned domain:

$ dig +short -x 157.55.16.222
msnbot-157-55-16-222.search.msn.com.
$ dig +short msnbot-157-55-16-222.search.msn.com
157.55.16.222

There is some good advice over here about your PCRE limits problem. See if you can track down which rule is causing the problem and go from there.

[severity "CRITICAL"] is not enough to determine whether a request was blocked or not. Requests can be blocked for [severity "NOTICE"] and allowed through for [severity "CRITICAL"] depending on your configuration and the reason for the severity. The string I see when a request has been blocked is Access denied with code 403 (phase 2). (Or sometimes (phase 1).)

If you can track down the same request in your access logs, you can check the return code there to be absolutely sure. You can either do this with the IP address and the timestamp (which is slightly fuzzy because an IP address can easily make more than one request in a single second.) or, if you have mod_unique_id you can add that to your access logs so you can match any line in your mod_security logs. To do this, add %{UNIQUE_ID}e to your LogFormat line.

Ladadadada
  • 25,847
  • 7
  • 57
  • 90
  • Thanks for your reply, very clear and easy to understand. You've filled in the blanks I needed. – Linnay May 16 '12 at 02:39