Can Machine Learning be utilized to identify and track IP Spoofing?

Question

"IP Spoofing" refers to changing source IP addresses so that the attack appears to be coming from someone else. When the victim replies to the address, it goes back to the spoofed address and not to the attacker’s real address.

"Where can I find datasets?" is off-topic here, so I removed it. — schroeder, Aug 05 '20 at 18:29
IP spoofing is much, MUCH easier to identify further away from the target. Even a couple of hops away. No machine learning required. Are you trying to identify IP spoofing by inspecting data on the target? If so, why? — schroeder, Aug 05 '20 at 18:31
Have you googled "ip spoofing machine learning"? I'm getting lots of research papers ... — schroeder, Aug 05 '20 at 19:25

score 2 · Answer 1 · answered Aug 05 '20 at 19:10

2

AI is not needed at all to identify IP spoofing, and the solution is trivial. It only needs the ISP to employ Egress Filtering on its side, and no client of him will be able to spoof any IP belonging to another ISP. Every ISP does this, and no spoofing exists anymore.

If Avocado Networks have the 123.123.123.0/24 block, and employs Egress Filtering, if any of its customers send a packet with a origin host 124.123.122.121, for example, the filter will detect a packet coming from an external network and drop it. The misbehaving client would only be able to spoof packets from the Avocado Networks block, and Avocado would find that client pretty fast.

Even if the "last mile" ISP does not employ it, the upstream ISP could, and alert Avocado Networks for IP spoofing coming from their networks, and threaten to throttle their connections until the spoofing ends. Avocado would have a big incentive to do so.

And IP spoofing on a TCP connection is possible on paper, but impossible in practice. SYN flooding is possible, but establishing a connection is not. UDP is easier, because there's no connection established - you just fire packets down the line. And Egress Filtering solves both IP spoofing over TCP and over UDP.

answered Aug 05 '20 at 19:10

ThoriumBR

50,648
13
127
142

2

I worked for a DDoS appliance start-up in 2000. We knew that our product was sunk if ISPs simply clicked the checkbox on their core routers to do this filtering. And yet, no one did ... – schroeder Aug 05 '20 at 19:25
If the upstream ISPs starts penalizing those bellow them, that could change. But I doubt they would do that too... – ThoriumBR Aug 05 '20 at 19:33
Well, I understand there are traditional ways to solve the problem egress filtering and such, was wondering if ML is applicable. – Jakob Aug 06 '20 at 18:51
@Jakob: Given that egress filtering is much more reliable, faster and easy than ML for this use case ML should only be used if these easier options are not available. – Steffen Ullrich Aug 06 '20 at 19:34
You can either use lots data and processing power to train an AI, and put a powerful computer checking every single packet that enters or leaves the network (and end up with latency, false positives and false negatives), or put an underpowered computer running iptables and being faster, cheaper, with no false positives nor negatives... – ThoriumBR Aug 06 '20 at 20:01

Steffen Ullrich · Answer 2 · 2020-08-05T19:09:16.137

Machine learning is no magic bullet. An IP packet itself has no intrinsic sign that the source IP address was spoofed. Thus one would need to look at the context. For example if the packet is part of a further bidirectional communication it was likely not spoofed. If this was a single packet only or if the communication is unidirectional it might have been spoofed but it might also be a specific behavior of the communication protocol.

In other words: with enough domain expertise in networks and network protocols one is probably able to extract and generate useful features from the network traffic and create and tune a ML model to detect spoofing. The better the feature are the better the model can be but likely it will contain still enough false positives and false negatives.

The performance (i.e. quality of detection) will also depend on the network where it was trained vs. the network where it gets used since expected behavior is also network specific. It might also be different in the same network at different times.

Apart from this: a simple google search will already provide you with many links to academic papers and other publications which discuss the topic in way more detail.

Thank you for your answer. I am assuming perhaps it will worth taking a look and try to determine the time between each hop (pinging, etc.) on IP addresses to determine the actual location of an IP address based on time intervals of the hops. What would a useful feature be? — Jakob, Aug 06 '20 at 18:47
@Jakob: *"What would a useful feature be?"* - this is far too broad and out of scope of your original question. Please read the available literature (as can be retrieved from the google search) for the details. — Steffen Ullrich, Aug 06 '20 at 19:32

Can Machine Learning be utilized to identify and track IP Spoofing?

2 Answers2