What are the common features to identify XSS attack from Apache log file?

Question

I have tried some XSS vulnerability on web application such as webgoat, OWASP mutillidae, bWAPP. I want to know the features/keywords/footprints of cross site scripting attack in apache log file and from these footprints, it is possible to identify that the XSS attack has been performed. I know that all the types of XSS attacks can't be detected through log file. But, I want whatever possible XSS types of features get stored into log file. I am giving few examples of footprints stored in log file as below,

The footprints in log file are,

1) <script>

2) <img>

3) <iframe>

Are these features (or footprints) correct? May I know any other possible features which get stored in log file in order to identify XSS attack has been performed? I want to write an algorithm further in order to detect XSS through access log file.

Though, it's possible to detect plain XSS attacks from apache log, it's not possible to detect all. The exploitation of XSS is context aware while logs aren't. Further, if you're using default apache log, it doesn't even store anything beside request line, referrer and user-agent. — 1lastBr3ath, Dec 27 '18 at 10:34
@1lastBr3ath, I know all the XSS attacks can not be detected through analysis of log file. Whatever possible features of XSS get stored into log file, I want those features. — Shree, Dec 27 '18 at 14:01

EdOverflow · Answer 1 · 2018-12-27T13:39:22.753

There are a couple of things I want to establish first before I give you general advice for spotting common characteristics of cross-site scripting (XSS) probing attempts in your logs.

I am assuming you will be manually inspecting your logs;
Without loss of generality, let's also assume you are able to keep up with the number of entries in your logs. So there are not thousands of requests hitting your server at any given moment.

These points should make it easier for me to describe the process by which you can spot XSS payloads in your logs.

The first issue I would like to tackle is trying to reduce the common characteristics down to a small handful of payloads. XSS is such an immensely large group of attack vectors and scenarios that even renowned security researchers often struggle to agree what term is appropriate to describe a particular type of XSS vulnerability. This can actually be seen in your question, you are trying to reduce XSS down to three specific payloads (<script>, <img>, and <iframe>). If you pick any of the many XSS-payload lists that you can find out there, you will quickly notice that there is a vast number of payloads (e.g., this payload.txt list).

On top of that, you only listed two categories of XSS vulnerabilities.

There are two types of XSS i.e. reflected and stored.

There are way more than just two types. For example, self-XSS and DOM-based XSS are not included in your list. These two missing types will be a particularly interesting challenge and I will go into detail later on as to why this is the case.

Types of payloads and contexts

So now that we have established that it is very difficult to reduce XSS vectors down to a small list, let's take a look at some common contexts where XSS vulnerabilities arise. The goal here is to demonstrate how from a very small list of contexts one can construct a variety of payloads with very little to no common characteristics.

The most notable contexts are HTML-based and JavaScript-context (once again, I am simplifying things here, there are many more contexts). Assuming there is no firewall or filter interfering with the payload, these aforementioned contexts will usually require certain characters in the payload to break out of the context or run client-side code within the context. To better explain some of these cases, I will refer to @Filedescriptor's XSS polyglot challenge list which can be found here. Do not worry I won't go through every single one, I plan to just pick out a couple to demonstrate how one constructs XSS payloads based on the context.

<div class="{{payload}}"></div>
<div class='{{payload}}'></div>
<title>{{payload}}</title>
<textarea>{{payload}}</textarea>
<style>{{payload}}</style>
<noscript>{{payload}}</noscript>
<noembed>{{payload}}</noembed>
<template>{{payload}}</template>
<frameset>{{payload}}</frameset>
<select><option>{{payload}}</option></select>
<script type="text/template">{{payload}}</script>
<!--{{payload}}-->
<iframe src="{{payload}}"></iframe> (" → )
<iframe srcdoc="{{payload}}"></iframe> (" →  < → )
<script>"{{payload}}"</script> (</script → <\/script)
<script>'{{payload}}'</script> (</script → <\/script)
<script>`{{payload}}`</script> (</script → <\/script)
<script>//{{payload}}</script> (</script → <\/script)
<script>/*{{payload}}*/</script> (</script → <\/script)
<script>"{{payload}}"</script> (</script → <\/script " → \")

My resulting payload which covers all the contexts above was:

javascript:"/*\"/*`/*' /*</template></textarea></noembed></noscript></title></style></script>-->&lt;svg onload=/*<html/*/onmouseover=alert()//>

This payload is known as a polyglot; i.e. the payload covers multiple contexts at once. To break out of the first context (<div class="{{payload}}"></div>), I had to use double quotes. "><svg onload=alert(1)> alone would have worked in this context.

Next, let's pick the second-to-last case: <script>/*{{payload}}*/</script>. To keep things simple, we will ignore the filter that was implemented in the challenge. This is a JavaScript-context and */alert(1)/* would break out of the multi-line comment.

For the last case that we will look at, I would like to include the filter. <iframe srcdoc="{{payload}}"></iframe> (" → < → ) replaces double quotes and the < character. To bypass this filter, one can simply HTML encode the < character: <img/src=x onerror=alert(1)>. This results in <iframe srcdoc="<img/src=x onerror=alert(1)>"></iframe>. This case did not require breaking out of the context.

Notice how with just three contexts alone I was able to show three very different payloads. So before attempting to use your logs to find XSS payloads, make sure to familiarise yourself with a few common contexts.

+-----------------------------------------------------+----------------------------------+
| Context                                             | Example payload                  |
+-----------------------------------------------------+----------------------------------+
| <div class="{{payload}}"></div>                     | "><svg onload=alert(1)>          |
+-----------------------------------------------------+----------------------------------+
| <script>/*{{payload}}*/</script>                    | */alert(1)/*                     |
+-----------------------------------------------------+----------------------------------+
| <iframe srcdoc="{{payload}}"></iframe> (" →  < → )` | &lt;img/src=x onerror=alert(1)> |
+-----------------------------------------------------+----------------------------------+

Probing from an attacker's perspective

This section covers how I as an adversary (more precisely, my personal experience as a bug bounty hunter) might go about probing for XSS vulnerabilities in your application and what this would look like in your logs.

Most notable bug bounty hunters that I can think of use a very basic probing vector for manually determining if user-input is reflected anywhere. So we might use something along the lines of '">foobar or '"><u>foobar to quickly gather various endpoints that reflect these payloads. Note that the foobar bit is actually quite important. We want to be able to quickly search for our payload in the source code, so hunters like to use unique strings in their payload. From your perspective, this means we leave a trail of fingerprints that you can follow to see where we are testing for XSS vulnerabilities.

This brings me to the next probing characteristic, you will very rarely see someone testing a single endpoint once and then giving up. If an adversary is scanning or manually testing for XSS vulnerabilities, they will usually be very persistent and you should see a long series of consequent XSS payloads popping up in your logs.

In addition to all of the probing characteristics listed above, there is one last and very important one that must be mentioned, the use of JavaScript methods such as alert, prompt, and confirm. I am sure you have come across these before while reading up about XSS. When testing for XSS vulnerabilities, one might want to get an immediate indication of a vulnerability and the easiest way of doing this is to get a big prompt fire right in front of your face. It becomes immediately obvious that you have an XSS vulnerability when the modal shows up. Also, the rush you get from the payload firing in that way never gets old. :)

You could grep your logs for keywords such as alert, prompt, and confirm. That being said, this is definitely not foolproof since, depending on the context, it may be possible to adjust the payload and mask this keyword — this is commonly seen when attempting to evade web-application firewalls and filters.

The problems with spotting self-XSS and DOM-based payloads

The issue can be quite simply summarised as: The payloads do not always show up in your logs. Take this example of a DOM-based XSS vulnerability:

<!-- test.html -->
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width">
  <title>DOM-based XSS example</title>
</head>
<body>
  <script>
    // Fetch the redirect parameter
    redirect = window.location.hash.substring(1);
    // URL-decode the value
    redirect = decodeURIComponent(redirect);
    if (redirect !== 'UNDEFINED' && redirect !== "") {
        // Redirect to the value
        location.href = redirect;
    }
  </script>
</body>
</html>

When navigating to http://example.com/test.html#javascript:alert(1) the payload should display an alert box, but when I check my logs all I see is the path.

GET /test.html HTTP/1.1

Hopefully, this example better illustrates some of the issues you might face when trying to spot certain XSS payloads in your logs.

The best advice I can give you for tackling this problem is to implement a Content Security Policy (CSP) with the report-to directive. Now whenever the CSP detects a policy violation, it will notify you at the report-to endpoint (see https://report-uri.com/ for a service that logs these errors via the report-to directive). You can also just implement the Content-Security-Policy-Report-Only header if you are only interested in logging errors.

A small idea for getting a feel for manually spotting XSS payloads in your logs

Here is a fun idea that might help you see all of what I have described above in action. Build a small XSS Capture the Flag competition (CTF) and get a group of friends that are security-oriented to try to find the XSS vulnerability in your application. During this time, take a look at your logs. Then do the same thing but with a scanner such as Burp Suite. After a while, you should get a feel for what basic XSS probing (for a certain number of XSS vectors) looks like in your logs.

Although I have mentioned only 2 types of XSS in question, But I am expecting all the possible features (footprints) in apache access log. I am reading your answer. I have not yet completed reading of your answer. I want to write an algorithm which will detect the XSS attack based on the features. So, I want to know all the features. — Shree, Dec 27 '18 at 13:48
I know all the types of XSS can't be detected through log file. I want the features/keywords/footprints of whatever possible subtypes of XSS can be detected through log file. — Shree, Dec 27 '18 at 13:59
Thanks for answer. If possible please modify the answer. I want the all possible list of features. — Shree, Dec 27 '18 at 14:05
Thanks for giving payload.txt file. You can give me more such links. — Shree, Dec 27 '18 at 14:06
There is of course a scenario where you won't see any probing in the logs: if the server has a well-known application installed and the attackers go for specific vulnerabilities that they know already. — Wladimir Palant, Dec 27 '18 at 17:07
@Shree, your initial question did not make what you are now describing clear at all. That being said, my answer still stands and should really illustrate how incredibly difficult of a task it is to build some sort of system that detects XSS payloads in your logs. Of course you could just build a basic list of cases, which is what you seem to be after, but this will only cover a very minute subset of payloads as shown in my answer above. Therefore, I might go even as far as advising against doing that, especially if you are planning on relying on this process for anything security related. — EdOverflow, Dec 27 '18 at 18:42

1lastBr3ath · Answer 2 · 2018-12-28T11:23:25.320

Whatever possible features of XSS get stored into log file, I want those features.

In that case, I'd go for something like;

grep -ahiP '[{}%<>]+' /var/log/httpd/access_log* | php -R 'print(urldecode(urldecode($argn))."\n");' | grep -aiP "(?:<[\w/?]+)|(?:['\"\s/]*\bon[a-z]+?\s*=['\"\s]*)"

As already said, it's missing contexts. Though, I'm pretty sure it covers a fair amount of XSS payloads.

In addition, you'll want to build regex (or whatever) for function calls and expressions as in Angular.

What are the common features to identify XSS attack from Apache log file?

2 Answers2

Types of payloads and contexts

Probing from an attacker's perspective

The problems with spotting self-XSS and DOM-based payloads

A small idea for getting a feel for manually spotting XSS payloads in your logs