Relationship between http time response and geo-location

Question

I have a lot of pcap files about traffic generated by malware, this include vast kind of packets like TCP UDP and more important HTTP. Whereas is possible I have the geo-locations related to the IP adresses; The question is: there is a way to find out if the location is a fake against the server response time?

score 1 · Accepted Answer · edited Apr 13 '17 at 12:13

On a reliable way, there isn't. The reasons:

There are many other circumstances which can increase the response time. For example, if there is a virtualized system or it is overloaded. Or its network is overloaded.
The TCP protocol uses a windowing scheme, which results you won't get not the ACKs to the actually received packets, but usually for 8 packets earlier, and it depends on the speed of the remote system to process these packets.
The topology of the networks is graph-like on the highest level (around from the country-level routing), while it is tree-like below that. The delay times are maybe reliable on the first, but start to be unreliable below that.
Even if it would work, you can get only a "distance" information which is very far from the localization.

But:

On sending the first TCP packet, the TCP window is nearly empty yet, so the delay of the first ACK packet is practically reliable delay information. It can be used essentially as a tcp-based ping.
If you have access to large server logs, you can construct a database from the typical delay time from every registered IP network on the net. It would be useful if you would store not only the mean value of the packet delay, but also their standard deviation.
If you have access to multiple servers, on widely remote places and network on the world (for example, one in middle Europe, other in New Zeeland), you can build a such timing database for all of them.
These databases should depend not only on the remote IP networks, but also on the time of the day, and of the day of the week (these network load depends mainly on these).

After that, based on a floating-point extension of the Bayes theorem (maybe extended with a little bit of fuzzy logic) you will be able to get a "beliavibility factor" of the IP delays. But it will result in most cases misleading results, because the cause of the unrealistic delays is in most cases on the last mile, and only in a relatively small part of obvious forgery.

But, if you connect this "suspectively behaving" IPs to other security logs, it can maybe serve as an useful source of "soft data". For example, it could be usable to heighten the sensitivity of spam filters or alarming firewalls.

It requires math on around the third-fourth semester of most universities, and at least one professional & senior programmer.

The things to be developed:

Building a such database and its hw/sw integration
Building the network monitoring scripts
And building the IP localizing script based on the network delay stats of an individual IP.

I would say it would be between a week and a half year of development, based on the details.

If you have enough machines, you can build a delay database even between nearly all networks on the Earth, which could be even used maybe for the localization of their real location. But it would be much harder.

Relationship between http time response and geo-location

1 Answers1