2

I have a theoretical knowledge about WAF but I do not have knowledge of the tools in the market. I wonder if there are any WAFs that base their decision making through a response from an external resource (true or false) when sorting an anomalous traffic.

My idea was to create a Machine Learning based for for example to make this decision and the WAF would just consult it.

Thanks!

  • A WAF is essentially a reverse proxy which not only forwards but also analyzes and maybe blocks the traffic. Creating such a reverse proxy which consults some external service to make the decision is kind of trivial - at least compared to writing a solid machine learning detection. You could for example setup squid as reverse proxy and use ICAP or the eCAP API to attach your detection. – Steffen Ullrich Nov 04 '18 at 14:33

4 Answers4

3

This may not be a popular opinion (cue comments), but I am not a fan of Machine Learning being used in the security industry.

I'm always skeptical when it seems like the approach is "We don't know how to solve this problem. I know! Let's throw ML at it!!". There are of course niches within security where ML seems to be doing ok-ish, for example detecting malware and financial fraud, but even there, it's used with caution.

Remember that ML is part of the field of statistics: the science of detecting average-case behaviour. In ML you worry about telling the average dog apart from the average cat, and don't worry about the 5% that it gets wrong. Meanwhile security is the field of detecting worst-case adversarial behaviour. In security your system needs to continue being strong even if the attacker can reverse-engineer it and provide worst-case input.

Now consider the paper: "EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES" by Goodfellow, Shlens, and Szegedy:

Several machine learning models, including neural networks, consistently misclassify adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence.

Here is the core graphic from that paper:

worst-case perturbation of a panda image

If it's that easy to "hack" an image classifier (arguably the best-researched subfield of ML), then what makes you think you can build an ML-based WAF filter that performs any better against adversarial hackers?


TL;DR: This has been my rant that ML shouldn't have a place in security unless it's by people who really really know what they are doing.

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
0

Using ML as a WAF trigger could slow down page and resource load times significantly, and present a significant issue when scaling to a larger use base.

A WAF has to reach a decision before the request is even processed, this means all you really have to go on is the request.

Most WAF already analyze the request and look for patterns that will catch many common attacks. How would you train an ML system to recognise an attack? It's difficult to separate what an attack is VS. new functionality you added to your site.

Daisetsu
  • 5,110
  • 1
  • 14
  • 24
  • 2
    For a research/class project this could be a fun thing to try, but I don't see it being used in industry. – Daisetsu Nov 04 '18 at 17:36
0

Rodrigo Martinez did a machine learning PoC long ago, https://github.com/SpiderLabs/owasp-modsecurity-crs/issues/1016#issuecomment-409608061 to improve the false positive rating.

The ML verification should not be a problem for small sites but as the requests scale up this could be a no go, introducing undesired latency waiting for a response may not be acceptable in many cases.

ML could be used to improve the running configuration to prevent similar future requests and track down weird asynchronous behaviors caused by an attack, ML should also be feed with other data like the network flows and process/socket creation of the involved servers.

0

I recommend this whitepaper with all the technical details related to your idea: https://wallarm.com/files/resources/Wallarm%20AI%20Engine.pdf

Technically you need to solve 3 tasks to apply application-specific detection in the right way:

  1. application endpoints profiling (clustering or so) to find all the web scripts and API calls, like /api/user/create, etc

  2. data profiling. To understand which kind of data is legitimate for each parameter in each endpoint. I.e. to understand that your "login" field in a /api/user/create should look like an email address

  3. attacks classification. To understand, for example, if an abnormal data in a "login" field at the /api/user/create endpoint looks like SQL injection, or I just have a De'Contour lastname with a quote inside.

guntbert
  • 1,825
  • 2
  • 18
  • 21