Extract data from an online atlas

4

1

There is an online atlas that I would like to extract values from. The atlas provides a tool ('Query') to extract values when you click a location or enclose a region on the map, or you can specify the latitude/longitude of a point where you want the value. Instead of extracting values manually, I would like automate data extraction using the command line; either I'd write a script to pull out values for an input longitude/latitude, or pull out all values for all locations, as long as I can also get the latitude/longitude of the returned points. What utility could scrape the data from the atlas, and be part of a command line script? scrapy looks promising but maybe there are better tools for this. Or if you could tell me what language the 'Query' tool uses, it would help me get started.

KAE

Posted 2012-08-08T12:52:55.597

Reputation: 1 467

Answers

2

This site heavily uses javascript (jQuery).

I suggest doing the following as a first start:

  1. Install Firefox
  2. Install the Tamper Data addon
  3. Load the site, start tampering
  4. Play around with the query tool and have a look on the XMLHttpRequests you automatically generate (see Tamper Data logs)

Perhaps this is already enough to identify the relevant requests and how they are created. If this doesn't work well you would have to read the javascript sources.

Since all requests are most probably just HTTP GET and HTTP POST requests with specific parameters, you could start automating e.g. with a tool like curl. Also, scrapy indeed looks promising and seems to bring a lot of nice features (didn't test it myself, though).


Instead of using Firefox with the Temper Data addon, you can also use any other browser while capturing the HTTP traffic e.g. with wireshark.

speakr

Posted 2012-08-08T12:52:55.597

Reputation: 3 379