Heuristics to Identify CSRF from Web Access Log File

Question

I am new here in security.

I want to identify suspicious users on web application by analyzing web access log file. For this, I am considering CSRF attack.

For this purpose, I am generating some heuristic (possible) rules for identification of suspicious users from web log. I am not confident but still guessed some rules,

In web log,

1. Referrer URL is blank or not equal to requested URL's domain name.

for e.g.

192.168.4.6   [10/Oct/2007:13:55:36 0700] "GET /trx.php? amt=100&toAcct=12345 HTTP/1.0" 200 4926
"http://www.attacker.com/freestuff.php" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"

Two fields are important here, the requested URL (/trx.php? amt=100&toAcct=12345) and the referer ("http://www.attacker.com/freestuff.php"). Usually, the referer is an URL from the same site (www.bank.com). Here is a sample perl snippet, how this could be detected:

# assuming $referer is set with the, well, referer
if ( ( $referer ne '' ) && ( $referer !~ /^https?:\/\/www.bank.com\/(login|overview|trx)\.jsp/ ) ) 
{
    # handle XSRF attack
    print(“XSRF attack: $referer\n”);
}

2. If HTTP status is 403 i.e. Access Denied

(If the CSRF token is not sent, or if an invalid CSRF token is sent in requests that require a CSRF token). So, here checking of 403 status will be included. Because token can not get checked in log file.

3. By measuring the time difference of the requests of a user.

If there was no user input for several minutes and then suddenly some transfer requests are coming in, it could be an indicator that this request was triggered by something/someone else. Here, it will be needed to check time difference upto the threshold value from same IP address.(Along with this, If values are present after ? symbol and if these would be 'pass','password','amount','amt','money', or any link and if User request status would be 200 i.e. successful or OK).

4. Multiple POST request (repeatation) from single IP address also results into CSRF.

Idempotent methods and web applications Methods PUT and DELETE are defined to be idempotent, meaning that multiple identical requests should have the same effect as a single request (note that idempotent refers to the state of the system after the request has completed, so while the action the server takes (e.g. deleting a record) or the response code it returns may be different on subsequent requests, the system state will be the same every time[citation needed]). Methods GET, HEAD, OPTIONS and TRACE, being prescribed as safe, should also be idempotent, as HTTP is a stateless protocol.

In contrast, the POST method is not necessarily idempotent, and therefore sending an identical POST request multiple times may further affect state or cause further side effects (such as financial transactions). In some cases this may be desirable, but in other cases this could be due to an accident, such as when a user does not realize that their action will result in sending another request, or they did not receive adequate feedback that their first request was successful. While web browsers may show alert dialog boxes to warn users in some cases where reloading a page may re-submit a POST request, it is generally up to the web application to handle cases where a POST request should not be submitted more than once.

5. A website might allow deletion of a resource through a URL such as http://example.com/article/1234/delete, which, if arbitrarily fetched, even using GET, would simply delete the article. (I don't know what to do here)

I know, CSRF identification from log file is difficult, so, I am mentioning possible ways (i.e. heuristics) here. If wrong, correction in this is required. Any more rules/help would be appreciated.

ShapeOfMatter · Answer 1 · 2018-12-18T20:20:38.733

I understand you're trying to count/analyze malignant cross-site requests based on the logs.

Technically speaking, a cross-site request is one issued by the browser of a real user while the user is interacting with a different website. Sometimes these are good, and should be enabled by "cross site resource sharing" (CORS) HTTP headers. If you don't want that to happen, then make sure your CORS headers are correct; they tell the user's browser to prevent other websites from making requests to yours.

In practice, a lot of robotic traffic will look like cross-site traffic in your logs. A script running on my cloud server that crawls your site for unprotected resources is vaguely like a malignant javascript ad that tries to hijack cookie-based authentication sessions.

CSRF tokens can hinder both of the above behaviors; it sounds like you already have that set up.

Let's look at your heuristics in turn.

Inspecting the Referer field probably isn't useful. You want people to be able to enter URLs manually (even if only a few people do, it's still important). You also want people to be able to link in from other sites. For pages where these things shouldn't be allowed, your real-time CSRF protections should kick in.
This if a fine search rule iff you're sure there are no other rules in your system that could cause a 403 response. If counting CSRF denials is a priority, then you could probably give them their own response code. Codes 418 or 420, or any 4xx not listed here would be fine, but your current analysis might not justify tweaking the server setup in this way.
I'm not sure I understand the details of what you're proposing here, but in general I would not assume that real users don't take long pauses in their interactions with your site.
De-duping POST requests is important, but probably a separate problem from cross-site requests.
Is the system you're working on your system? The behavior you describe in this item is certainly possible, but it's not typical. For purposes of identifying CSRF attacks, I wouldn't worry about this unless you already know it's possible on your system or you already know an attacker is trying it against you.

If you're searching specifically for bad cross-site requests:

Make sure your CORS Headers are properly set up to prevent requests, and make sure your CORF Tokens are good enough for your application. Then search by HTTP response as described in point 2 above. If 403 isn't specific to CSRF denial and you don't want to use a different code, you may be able to filter out the other instances of 403 using heuristics for those cases.

If you're searching for robot traffic:

You can inspect the User-Agent header. I haven't been able to find any modern-looking lists of common bots, but the open-ended search in this answer is probably good.

If you're searching for bad robot traffic:

Welcome to the arms race! A dedicated attacker will, at minimum, be able to get their robots to do anything a human user would be able to do, and if that's all they're doing then you'll never be able to tell the difference. That said, there are some things you can keep an eye out for:

Lots of 404, 403, or other errors in quick succession or from the same IPs.
Traffic from an IP at a faster rate than a human user could realistically cause.

You will find traffic from "attackers". Don't panic, maybe don't even worry about it. If you're set up to deny them what they're after, then they're not causing any problems. (Except maybe DOSing you, in which case you could look into a rate limiter.)

If you have any heuristic rules to identify csrf in log file, then please let me know. I just want to identify csrf in log file records. — Shree, Dec 18 '18 at 17:39

Heuristics to Identify CSRF from Web Access Log File

1 Answers1

Let's look at your heuristics in turn.

If you're searching specifically for bad cross-site requests:

If you're searching for robot traffic:

If you're searching for bad robot traffic: