2

I am trying to build a machine learning model which classifies attacks. My data has a bunch of IP addresses, and I don't know if I should use the IP address as a feature to detect attacks. I found this interesting argument:

"IP can be spoofed by the attacker. Hence, it may be infeasible to use it as a feature for attack classification in intrusion detection systems. Features which are independent and cannot be changed by attacker can be useful in classification problems."

This is pretty logical for me, but I don't know if I should completely ignore the IP address in intrusion detection, especially that my data (log files from different devices) has multi-step attack scenarios, What do you think?

U. User
  • 180
  • 8
  • 1
    It all depends on what is meant by "spoofing" and if this feature has any value in the models you are creating. IPs cannot be "spoofed" and still make a connection. attackers can use other machines, though, in an attack and thereby obfuscate the true source of the attack. Which do you mean? – schroeder Jan 24 '19 at 21:56
  • What types of intrusions are you detecting? Are you looking for attacks with more than one step? – schroeder Jan 24 '19 at 21:57
  • Well from the analysis of the logs : the intrusions are exploit AWStats, downloading and runing IRC bot, backdors, etc. – U. User Jan 25 '19 at 21:32

1 Answers1

1

I believe the IP address provides useful information.

IP addresses can be "spoofed" in the connectionless protocols like UDP and ICMP. If you're worried about DDoS attacks via traffic amplification, the spoofed address is almost critical to discovering what's going on.

Connection-oriented protocols like TCP or SCTP are far harder to spoof (though BGP hijacking means it happens surprisingly often).

Attacks I've seen from botnets often include a large number of compromised machines from some specific ISPs. In some attacks it was just easier to block entire /16s from the attacked services. Once I found the specific ISP it was easy to look up other netblocks they own and block those too, on grounds that either their CPE was hacked, or their users were insecure as a group. Tracking the IPs lets you find out if specific networks have poor security.

sarnold
  • 721
  • 4
  • 7
  • This means that IPs are useful when you start to suspect malicious activity on the network,so you start blacklisting and blocking as you have described. But I am more interested in the features that the machine learning model can use to profile a scenario.For example, to detect a brute force attack: I would be more interested on the number of attempts as a feature, delays between each connection attempt, etc. But NOT particularly the IP address since in that time, I don't know yet what type of activity is going on. The usefulness of the IP address come later when I need to block like you said – U. User Jan 25 '19 at 21:47