25

I'm looking to extract interesting information from webserver logs and I wonder which HTTP status codes should I filter out?

For example, 200 hits can be considered to be 'regular behavior' whereas lots of 404 hits from a certain IP probably means someone is up to no good (automated scanning)..

So, which of these that I mostly see in logs:

304 - Not Modified
404 - Not found
302 - Found
206 - Partial content
301 - Moved permanently
500 - Internal Server Error
403 - Forbidden
501 - Not implemented
406 - Not acceptable
416 - Requested Range Not Satisfiable
other?

should I filter out, and which ones do usually give insightful info? Which ones are known to be used by bad guys for information gathering? Probably the most "interesting" one is 404, but I would like to get more opinions on this if someone else dealt with this in the past. Thanks..

AviD
  • 72,138
  • 22
  • 136
  • 218
tkit
  • 3,272
  • 5
  • 28
  • 36

2 Answers2

21

HTTP 200s can be awesome for an attacker when he is requesting URIs that should be protected by authorization (http://cwe.mitre.org/data/definitions/862.html).

Attackers pay notice to HTTP 500s – they often lead to offensive success. Observing lots of HTTP 500s can be interesting. If the app likes to redirect (HTTP 302) upon errors, then lots of HTTP 302s can be interesting.

To catch nefarious activities it may be best to observe numerical changes (min, max, median, mean, standard deviation) of some or all of the HTTP status codes. Observing 50,000 HTTP 200s in a short time span from an IP block can mean your databases are on their way to public fame. Lots of unusual HTTP 200s can also mean an attacker has found a way to send malicious requests successfully that are answered with HTTP 200s.

If you have lots of time to play you could model the activity of "normal" users (i.e. how many of each of the HTTP status codes does a normal user generate during a session, for example) then look for variations that cross tolerance thresholds. Search for anomaly algorithms and try ones you think may work to identify unusual activity by analyzing only HTTP status codes. Could be fun research actually.

On a side note if you are looking at Apache logs for attacks, check out http://code.google.com/p/apache-scalp/

Scalp! is a log analyzer for the Apache web server that aims to look for security problems. The main idea is to look through huge log files and extract the possible attacks that have been sent through HTTP/GET (By default, Apache does not log the HTTP/POST variable).

Tate Hansen
  • 13,714
  • 3
  • 40
  • 83
  • 2
    I find that my error logs have a lot of 404's, usually an automated attempt to find some variation of PHPMyAdmin or administrative pages in common blogging/CMS platforms. – Andrew Lambert Jul 15 '11 at 07:12
  • 3
    The king of 404s is DirBuster https://www.owasp.org/index.php/Category:OWASP_DirBuster_Project – Tate Hansen Jul 15 '11 at 07:35
8

It's dangerous to dismiss a request as not interesting based on something as low-resolution as a response code. A 200 response in /admin/ from an unknown IP address is interesting.

In addition to the codes you have listed as interesting, 400 responses can be interesting (they are caused by Slowloris for instance and they are also caused by attackers manually crafting HTTP requests and getting the protocol wrong.)

401 responses can be interesting as they are generated by HTTP authentication (often called htaccess authentication). Lots of these can indicate a brute force attack.

In fact, I would say that everything in the 400 range is interesting. A full list of response codes can be found here.

But what you asked about was which ones are not interesting. For this, I would tentatively suggest the entire 300 range with the caveat that anything unusual is interesting. In practice, this means that 301, 302 and 304 are probably not interesting. 302 responses are often the result of form submissions and will be caused a lot by comment spammers so whether you consider this interesting or not is an individual decision.

The unusual keyword is probably more useful here than any particular response code list. This applies to every field in the logs. Unusual request methods such as PUT or CONNECT are very interesting, even if they return one of the non-interesting response codes.

Once you have found an unusual request that warrants further investigation, the next thing you should do is grab every request that the same IP address made, even if you would normally ignore them as being not interesting due to their response codes.

A series of 401s followed by a 200 could an indication that the attacker finally guessed the password correctly and got the admin page. If you ignore the 200 responses you might assume that it was an unsuccessful brute force attack.

After seeing a successful attack like the one above, you would then be interested in the URLs of every request the same IP address made rather than the response codes.

Ladadadada
  • 5,163
  • 1
  • 24
  • 41
  • thank you for your good answer. of course filtering by status codes is not the only thing I am doing, it is just a part of a script. and of course when you have a few hundred thousand of events, and 80-90% of them are 200s, you want to filter it all out for starters, and later take a look at only some of them when you have a context to put them into. thanks again – tkit Jul 16 '11 at 11:18
  • @Ladadadada welcome to the site, and thanks for the great answer! – AviD Jul 17 '11 at 13:05