What are the methods used by GeoIP services besides WHOIS info?

11

2

I was just wondering how GeoIP services collect data about IPs geo location besides checking IP address WHOIS information. For example I stumbled upon this website, that says that IP 74.207.244.221 is being located in Fremont, California: https://ipinfo.io/74.207.244.221

But I can not find this info on this IP whois information. ipinfo.io states that:

Originally our API used MaxMind data, but we've been very busy working on creating our own geolocation data. We've made a lot of progress, and we now use our own data to service around half of all requests. We do still fallback to MaxMind data though

And this got me interested, what are the ways in which services like ipinfo.io and MaxMind collect GeoIP data?

Learner

Posted 2018-10-27T09:52:02.610

Reputation: 295

Question was closed 2018-11-01T21:30:45.603

Answers

11

Such services usually use 3 ways to geolocate an IP address:

  1. Going through whois databases to search for an address;
  2. Tracking reverse DNS queries to try and find clues based on domain-name records or tracking the path of packet sent to the destination, which could also give clues (using traceroute, for example).
  3. And lastly, they use RTT triangulation.

Round-Trip Time (RTT) Triangulation is a method used to obtain the approximate geolocation of an IP address by measuring the ping latency from three different locations.

For example, if you have three servers spread across the world in the shape of a triangle, and if you ping an IP address from all the three and get the same results for latency, then that would mean that the IP address is located right in the centre of that triangle. It's the way triangulation works, however, in this case it is used with ICMP pings.

Resources you can read:
What is ping? @ Wikipedia
SIGCOMM paper about RTT triangulation

Fanatique

Posted 2018-10-27T09:52:02.610

Reputation: 3 475

5Wow, I would never think that there is a triangulation based on RTT. Interesting. So if some server would like to avoid being geolocated it could introduce random lag in ICMP responses. – Learner – 2018-10-27T12:13:48.267

I would like to add that it might be helpful to look at tracert/traceroute, as wrote on iplocation.net: "You may use 'traceroute' command to find clues to the location of the IP address. The names of the routers through which packets flow from your host to the destination host might hint at the geographical path of the final location." – Learner – 2018-10-28T09:48:10.487

1@Learner that's a nice addition, however, that is already in my answer in the form of "tracking reverse DNS queries". Although traceroute doesn't really do that, it shows you all the domains/addresses through which a query travels. I'll add a note to make it more clear nonetheless :) – Fanatique – 2018-10-30T13:50:25.723

5

I'm the founder of IPinfo, so I can definitely offer some details around this! There's not one single method we use, or a single data source, to produce our own geolocation database (or any of our other data sets, like IP to company, or IP to carrier). It's a mix of a bunch of different data sets, data processing techniques, and lessons learned doing this for a several years now!

Some data sources and techniques not often mentioned include:

  • Direct feeds from ISPs. Our service handles around 500 million API requests a day, and it used on many popular high profile websites. Therefore ISPs are incentivized to provide us with accurate up-to-date geolocation data so that their customers get a great experience on the web. We're working directly with more and more ISPs all the time.

  • GPS location data. It's possible to collect precise location information with GPS on mobile devices. You can pair that with the IP address and some network topology inference to work out the location for IP ranges given just a few measurements.

  • User submitted corrections. When we do get the location wrong (or it hasn't been updated after a change) we'll often quickly get feedback from users, and can manually fix the location, or tweak our algorithm to ensure it's correctly located on the next run of our data processing pipeline.

For our IP to company data set we actually scrape every single domain name every month, and cross reference the data we extract there with IP ownership information, rwhois records and more. We then also use the domain scraping data to show what domains are hosted on what IP addresses, and also in our IP type classifier, along with many other data sources, to determine the probability of an IP address being primarily used as a residential ISP, business, or hosting provider. We also analyze the link structure of those pages, and show some of this data on host.io.

Ben Dowling

Posted 2018-10-27T09:52:02.610

Reputation: 362

Thanks! I didn't expect that founder of this website will reply to my question :) It was very interesting. – Learner – 2018-10-29T09:04:33.827