Downloading files from a site with javascript links

1

I sometimes find sites that post content (files) as javascript links. In cases where the links are posted with the traditional <a href="..."> construct, one can easily parse the HTML, find the link and download the content. Even applications like Acrobat are able to handle this and generate a PDF of the relevant area of a site.

Not so with javascript links.

Here is an example of a site which has content (public access, no login or password required) but uses javascript links.

How does one go about downloading the PDF files here programmatically?

http://www.oml.ago.state.ma.us/

There are tabs for each year, take this one for 2013.

http://www.oml.ago.state.ma.us/Default.aspx?sectionYear=1&year=2013

There are several hundred links here but short of clicking on each one I can't figure any way of finding the target and downloading them.

amrith

Posted 2014-02-04T12:02:43.327

Reputation: 111

@Leonid, I didn't have a particular language in mind. I was hoping it would be a shell script but if it means programming it in some language then I'd assume that java would be option #1. Thx. – amrith – 2014-02-04T12:08:07.760

Answers

1

Two options spring to mind (neither of them Java):

  1. Write a JavaScript bookmarklet that you can click on in your browser and scrape the DOM elements after the page you want to scrape has loaded and the JS has executed. This will work but won't scale for a large number of pages.

  2. Use a headless browser like http://casperjs.org/, http://phantomjs.org/ or http://slimerjs.org/

Steve Claridge

Posted 2014-02-04T12:02:43.327

Reputation: 111

0

You can find it with a developer console, looking at the network.

The url is http://www.oml.ago.state.ma.us/default.aspx, with some post parameters :

Host: www.oml.ago.state.ma.us
User-Agent: [...]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://www.oml.ago.state.ma.us/
Cookie: [...]
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 5713

__EVENTTARGET=ctl00%24ContentPlaceHolder1%24grdOML%24ctl02%24lnkOpenFile&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKLTI3MjY2NDEzNg9kFgJmD2QWAgIDD2QWAgIBD2QWCAIBD2QWAmYPFgIeBFRleHQFgQM8dGFibGUgd2lkdGg9JzcwJScgY2VsbHBhZGRpbmc9JzInIGNlbGxzcGFjaW5nPScyJyBib3JkZXI9JzAnPjx0cj48dGQgYmdjb2xvcj0nI2RjZGNkMCdhbGlnbj0nbGVmdCcgdmFsaWduPSdtaWRkbGUnY2xhc3M9J25hdmlnYXRpb25UZXh0J3dpZHRoPSc0MCUnPjxiPjxhIGhyZWY9J0RlZmF1bHQuYXNweD9zZWN0aW9uPTAnPkJyb3dzZSBPTUwgRGV0ZXJtaW5hdGlvbnM8L2E%2BPC9iPjwvdGQ%2BPHRkIGJnY29sb3I9JyNmMGYwZTgnYWxpZ249J2xlZnQnIHZhbGlnbj0nbWlkZGxlJ2NsYXNzPSduYXZpZ2F0aW9uVGV4dCd3aWR0aD0nMzUlJz48YSBocmVmPSdTZWFyY2guYXNweD9zZWN0aW9uPTEnPlNlYXJjaCBPTUwgRGV0ZXJtaW5hdGlvbnM8L2E%2BPC90ZD48L3RyPjwvdGFibGU%2BZAIDD2QWAmYPFgIfAAWbBjx0YWJsZSB3aWR0aD0nMTAwJScgY2VsbHBhZGRpbmc9JzInIGNlbGxzcGFjaW5nPScyJyBib3JkZXI9JzAnPjx0cj48dGQgYmdjb2xvcj0nI2RjZGNkMCdhbGlnbj0nbGVmdCcgdmFsaWduPSd0b3AnY2xhc3M9J25hdmlnYXRpb25UZXh0J3dpZHRoPScyMiUnPjxiPjxhIGhyZWY9J0RlZmF1bHQuYXNweD9zZWN0aW9uWWVhcj0wJnllYXI9MjAxNCc%2BMjAxNDwvYT48L2I%2BPC90ZD48dGQgYmdjb2xvcj0nI2YwZjBlOCdhbGlnbj0nbGVmdCcgdmFsaWduPSd0b3AnY2xhc3M9J25hdmlnYXRpb25UZXh0J3dpZHRoPScxOS41JSc%2BPGEgaHJlZj0nRGVmYXVsdC5hc3B4P3NlY3Rpb25ZZWFyPTEmeWVhcj0yMDEzJz4yMDEzPC9hPjwvdGQ%2BPHRkIGJnY29sb3I9JyNmMGYwZTgnYWxpZ249J2xlZnQnIHZhbGlnbj0ndG9wJ2NsYXNzPSduYXZpZ2F0aW9uVGV4dCd3aWR0aD0nMTkuNSUnPjxhIGhyZWY9J0RlZmF1bHQuYXNweD9zZWN0aW9uWWVhcj0yJnllYXI9MjAxMic%2BMjAxMjwvYT48L3RkPjx0ZCBiZ2NvbG9yPScjZjBmMGU4J2FsaWduPSdsZWZ0JyB2YWxpZ249J3RvcCdjbGFzcz0nbmF2aWdhdGlvblRleHQnd2lkdGg9JzE5LjUlJz48YSBocmVmPSdEZWZhdWx0LmFzcHg%2Fc2VjdGlvblllYXI9MyZ5ZWFyPTIwMTEnPjIwMTE8L2E%2BPC90ZD48dGQgYmdjb2xvcj0nI2YwZjBlOCdhbGlnbj0nbGVmdCcgdmFsaWduPSd0b3AnY2xhc3M9J25hdmlnYXRpb25UZXh0J3dpZHRoPScxOS41JSc%2BPGEgaHJlZj0nRGVmYXVsdC5hc3B4P3NlY3Rpb25ZZWFyPTQmeWVhcj0yMDEwJz4yMDEwPC9hPjwvdGQ%2BPC90cj48L3RhYmxlPmQCBQ8QZA8WAWYWARAFDy0tUHJpb3IgWWVhcnMtLQUPLS1QcmlvciBZZWFycy0tZxYBZmQCBw88KwANAQAPFgQeC18hRGF0YUJvdW5kZx4LXyFJdGVtQ291bnQCDWQWAmYPZBYcAgEPZBYKZg9kFgICAQ8PFgQfAAUKMDEvMzEvMjAxNB4PQ29tbWFuZEFyZ3VtZW50BV5PTUwtMjAxNC03LVNlZWtvbmstQW5pbWFsLVNoZWx0ZXItQnVpbGRpbmctQ29tbWl0dGVlLWFuZC1TZWVrb25rLUJvYXJkLW9mLVNlbGVjdG1lbi5wZGY7Mzg0NzM3ZGQCAQ8PFgIfAAUKT01MIDIwMTQtN2RkAgIPDxYCHwAFKVNlZWtvbmsgQW5pbWFsIFNoZWx0ZXIgQnVpbGRpbmcgQ29tbWl0dGVlZGQCAw8PFgIfAAUFTG9jYWxkZAIEDw8WAh8ABQYmbmJzcDtkZAICD2QWCmYPZBYCAgEPDxYEHwAFCjAxLzI3LzIwMTQfAwUwT01MLTIwMTQtNi1Ib2xsYW5kLUJvYXJkLW9mLVNlbGVjdG1lbi5wZGY7Mzg2Njg2ZGQCAQ8PFgIfAAUKT01MIDIwMTQtNmRkAgIPDxYCHwAFGkhvbGxhbmQgQm9hcmQgb2YgU2VsZWN0bWVuZGQCAw8PFgIfAAUFTG9jYWxkZAIEDw8WAh8ABQdIb2xsYW5kZGQCAw9kFgpmD2QWAgIBDw8WBB8ABQowMS8yNy8yMDE0HwMFLU9NTC0yMDE0LTUtTG9uZ21lYWRvdy1TZWxlY3QtQm9hcmQucGRmOzM4MDc4OGRkAgEPDxYCHwAFCk9NTCAyMDE0LTVkZAICDw8WAh8ABRdMb25nbWVhZG93IFNlbGVjdCBCb2FyZGRkAgMPDxYCHwAFBUxvY2FsZGQCBA8PFgIfAAUKTG9uZ21lYWRvd2RkAgQPZBYKZg9kFgICAQ8PFgQfAAUKMDEvMjcvMjAxNB8DBTQxLTI3LTE0LUVzc2V4LUJvYXJkLW9mLVNlbGVjdG1lbl9SZWRhY3RlZC5wZGY7MzkxMTg4ZGQCAQ8PFgIfAAUHMS0yNy0xNGRkAgIPDxYCHwAFGEVzc2V4IEJvYXJkIG9mIFNlbGVjdG1lbmRkAgMPDxYCHwAFBUxvY2FsZGQCBA8PFgIfAAUFRXNzZXhkZAIFD2QWCmYPZBYCAgEPDxYEHwAFCjAxLzI3LzIwMTQfAwU%2BMS0yNy0xNC1TdHVyYnJpZGdlLUNvbnNlcnZhdGlvbi1Db21taXNzaW9uX1JlZGFjdGVkLnBkZjszODk1MzdkZAIBDw8WAh8ABQcxLTI3LTE0ZGQCAg8PFgIfAAUiU3R1cmJyaWRnZSBDb25zZXJ2YXRpb24gQ29tbWlzc2lvbmRkAgMPDxYCHwAFBUxvY2FsZGQCBA8PFgIfAAUKU3R1cmJyaWRnZWRkAgYPZBYKZg9kFgICAQ8PFgQfAAUKMDEvMjEvMjAxNB8DBTlPTUwtMjAxNC00LU1hc3NhY2h1c2V0dHMtQm9hcmQtb2YtQm9pbGVyLVJ1bGVzLnBkZjszODA4MTVkZAIBDw8WAh8ABQpPTUwgMjAxNC00ZGQCAg8PFgIfAAUVQm9hcmQgb2YgQm9pbGVyIFJ1bGVzZGQCAw8PFgIfAAUFU3RhdGVkZAIEDw8WAh8ABQZCb3N0b25kZAIHD2QWCmYPZBYCAgEPDxYEHwAFCjAxLzIxLzIwMTQfAwUyMS0yMS0xNC1DYW1icmlkZ2UtQ2l0eS1Db3VuY2lsX1JlZGFjdGVkLnBkZjszODY4MjhkZAIBDw8WAh8ABQcxLTIxLTE0ZGQCAg8PFgIfAAUWQ2FtYnJpZGdlIENpdHkgQ291bmNpbGRkAgMPDxYCHwAFBUxvY2FsZGQCBA8PFgIfAAUJQ2FtYnJpZGdlZGQCCA9kFgpmD2QWAgIBDw8WBB8ABQowMS8yMS8yMDE0HwMFOTEtMjEtMTQtU3R1cmJyaWRnZS1Cb2FyZC1vZi1TZWxlY3RtZW5fUmVkYWN0ZWQucGRmOzM5NDIzOWRkAgEPDxYCHwAFBzEtMjEtMTRkZAICDw8WAh8ABR1TdHVyYnJpZGdlIEJvYXJkIG9mIFNlbGVjdG1lbmRkAgMPDxYCHwAFBUxvY2FsZGQCBA8PFgIfAAUKU3R1cmJyaWRnZWRkAgkPZBYKZg9kFgICAQ8PFgQfAAUKMDEvMjEvMjAxNB8DBUcxLTIxLTE0LVByb3ZpbmNldG93bi1IaXN0b3JpY2FsLURpc3RyaWN0LUNvbW1pc3Npb25fUmVkYWN0ZWQucGRmOzM3NTgxNGRkAgEPDxYCHwAFBzEtMjEtMTRkZAICDw8WAh8ABSlQcm92aW5jZXRvd24gSGlzdG9yaWMgRGlzdHJpY3QgQ29tbWlzc2lvbmRkAgMPDxYCHwAFBUxvY2FsZGQCBA8PFgIfAAUMUHJvdmluY2V0b3duZGQCCg9kFgpmD2QWAgIBDw8WBB8ABQowMS8xMy8yMDE0HwMFMU9NTC0yMDE0LTMtRWdyZW1vbnQtQm9hcmQtb2YtU2VsZWN0bWVuLnBkZjszNzgyMTdkZAIBDw8WAh8ABQpPTUwgMjAxNC0zZGQCAg8PFgIfAAUbRWdyZW1vbnQgQm9hcmQgb2YgU2VsZWN0bWVuZGQCAw8PFgIfAAUFTG9jYWxkZAIEDw8WAh8ABQhFZ3JlbW9udGRkAgsPZBYKZg9kFgICAQ8PFgQfAAUKMDEvMTMvMjAxNB8DBUxPTUwtMjAxNC0yLU1pbnV0ZW1hbi1SZWdpb25hbC1UZWNobmljYWwtU2Nob29sLURpc3RyaWN0LUNvbW1pdHRlZS5wZGY7MzcwMzcxZGQCAQ8PFgIfAAUKT01MIDIwMTQtMmRkAgIPDxYCHwAFOE1pbnV0ZW1hbiBSZWdpb25hbCBWb2NhdGlvbmFsIFRlY2huaWNhbCBTY2hvb2wgQ29tbWl0dGVlZGQCAw8PFgIfAAURUmVnaW9uYWwvRGlzdHJpY3RkZAIEDw8WAh8ABQYmbmJzcDtkZAIMD2QWCmYPZBYCAgEPDxYEHwAFCjAxLzEzLzIwMTQfAwU3MS0xMy0xNC1Bc2hmaWVsZC1Cb2FyZC1vZi1TZWxlY3RtZW5fUmVkYWN0ZWQucGRmOzM3MDI2NmRkAgEPDxYCHwAFBzEtMTMtMTRkZAICDw8WAh8ABRVBc2hmaWVsZCBTZWxlY3QgQm9hcmRkZAIDDw8WAh8ABQVMb2NhbGRkAgQPDxYCHwAFCEFzaGZpZWxkZGQCDQ9kFgpmD2QWAgIBDw8WBB8ABQowMS8wMi8yMDE0HwMFNU9NTC0yMDE0LTEtQm94Zm9yZC1ab25pbmctQm9hcmQtb2YtQXBwZWFscy5wZGY7MzY1NTAzZGQCAQ8PFgIfAAUKT01MIDIwMTQtMWRkAgIPDxYCHwAFH0JveGZvcmQgWm9uaW5nIEJvYXJkIG9mIEFwcGVhbHNkZAIDDw8WAh8ABQVMb2NhbGRkAgQPDxYCHwAFB0JveGZvcmRkZAIODw8WAh4HVmlzaWJsZWhkZBgBBSBjdGwwMCRDb250ZW50UGxhY2VIb2xkZXIxJGdyZE9NTA88KwAKAQgCAWQqRlzk94heDgb756WGG3iXbo2UvA%3D%3D&__EVENTVALIDATION=%2FwEWFAKH5NrcBAKbtOHzBQKO7pzyCQKY2J3zAwLlxIrvAwK9oZrLDQKN6YqwCgLFgqFvAsSCpY4JAq2B6YkBAsSC7agLAsuCkcwDAsqClesJArOBmZ0MAq6BncQIAuXatqUOAoDbuswKAoDbnm8C59qijgkCxNrmiQFU8mZCmbVka60Kj%2BqgzpL%2Fbfuz8A%3D%3D

Trying to hide an url to a public document is always stupid and useless. It also breaks the navigation (for example, you can't just open it in a new tab ...).

sebcap26

Posted 2014-02-04T12:02:43.327

Reputation: 101

I got that far but posting to the URL

http://www.oml.ago.state.ma.us/default.aspx with the parameters tacked on at the end as

http://www.oml.ago.state.ma.us/default.aspx?<complete parameter string>

in wget did nothing for me. Hence the question.

– amrith – 2014-02-04T12:19:49.617

That can't work, it's POST parameters, not GET, so you can't put it in the url. – None – 2014-02-04T12:30:12.443

0

Thanks to @sebcap26 for pointing me in the right direction.

I guess the solution is:

wget http://www.oml.ago.state.ma.us/default.aspx --post-data="parameters"

amrith

Posted 2014-02-04T12:02:43.327

Reputation: 111