I’m trying to figure out a way to use Wget or a similar tool so that I can schedule a web page to be downloaded regularly as a sort of updating log. The problem is that the page requires that I be logged in. Otherwise I get a different page, generic.

Further, the page does not take login information as GET parameters in the URL; it uses POST to log in on the login page and cookies to save the login information that’s read by the regular page.

I’m currently using GNU Wget 1.10.2 for Windows. I’ve tried using Wget’s cookie functionality, but I have had mixed results, usually skewing towards it not working.

Is there a way to accomplish this?


Another solution, after being logged in in the browser, and if you don't want to use the Firefox cookie extractor in Python, is to open your web inspector and check what session headers are sent.

For example, in Chrome:

Remote Address:
Request URL:http://example.com
Request Method:GET
Status Code:200 OK
Request Headersview source
Accept-Encoding:gzip, deflate, sdch
Cookie:_ga=GA1.2.228078207.1409667791; mp_d6ebe82547b18c335122656ad5df6c0e_mixpanel=%7B%22distinct_id%22%3A%20%221492964fd1e75-0b7e66217-39740157-15f900-1492964fd1f1b8%22%2C%22%24initial_referrer%22%3A%20%22%24direct%22%2C%22%24initial_referring_domain%22%3A%20%22%24direct%22%7D; rack.session=BAh7B0kiD3Nlc3Npb25faWQGOgZFVEkiRTMyZGMwMTc0OWMwNmE2YzBjYWQ4%0AMjM1ODdjNGZlNzY4NDdmZjNkY2ZhYWIzNWNiYmYxZWM1MjkwMGM0YTNhYzQG%0AOwBGSSIcd2FyZGVuLnVzZXIuZGVmYXVsdC5rZXkGOwBUVToZV2FyZGVuOjpH%0AaXRIdWI6OlVzZXJ7BzoMYXR0cmlic3sNSSIKbG9naW4GOwBGSSISYXVnLXJp%0AZWRpbmdlcgY7AFRJIgdpZAY7AEZpA%2BwPHkkiD2F2YXRhcl91cmwGOwBGSSI4%0AaHR0cHM6Ly9hdmF0YXJzLmdpdGh1YnVzZXJjb250ZW50LmNvbS91LzE5NzAx%0ANTY%2Fdj0zBjsAVEkiEGdyYXZhdGFyX2lkBjsARkkiAAY7AFRJIg9zaXRlX2Fk%0AbWluBjsARkZJIgluYW1lBjsARkkiF0F1Z3VzdGluIFJpZWRpbmdlcgY7AFRJ%0AIgxjb21wYW55BjsARkkiC0NvcGFzcwY7AFRJIgplbWFpbAY7AEZJIgAGOwBU%0AOgp0b2tlbkkiLTExMzg4NDkzNGIzZDkxNTMzOGJlOTU3YjcxZTA3OTU3ZDhh%0AYWQ2YjEGOwBU%0A--cf66a01faadf81e2cf2997a9e01c7dccdc5c67ba
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36

Here the following command will work:

wget --header "Cookie: rack.session=BAh7B0kiD3Nlc3Npb25faWQGOgZFVEkiRTMyZGMwMTc0OWMwNmE2YzBjYWQ4%0AMjM1ODdjNGZlNzY4NDdmZjNkY2ZhYWIzNWNiYmYxZWM1MjkwMGM0YTNhYzQG%0AOwBGSSIcd2FyZGVuLnVzZXIuZGVmYXVsdC5rZXkGOwBUVToZV2FyZGVuOjpH%0AaXRIdWI6OlVzZXJ7BzoMYXR0cmlic3sNSSIKbG9naW4GOwBGSSISYXVnLXJp%0AZWRpbmdlcgY7AFRJIgdpZAY7AEZpA%2BwPHkkiD2F2YXRhcl91cmwGOwBGSSI4%0AaHR0cHM6Ly9hdmF0YXJzLmdpdGh1YnVzZXJjb250ZW50LmNvbS91LzE5NzAx%0ANTY%2Fdj0zBjsAVEkiEGdyYXZhdGFyX2lkBjsARkkiAAY7AFRJIg9zaXRlX2Fk%0AbWluBjsARkZJIgluYW1lBjsARkkiF0F1Z3VzdGluIFJpZWRpbmdlcgY7AFRJ%0AIgxjb21wYW55BjsARkkiC0NvcGFzcwY7AFRJIgplbWFpbAY7AEZJIgAGOwBU%0AOgp0b2tlbkkiLTExMzg4NDkzNGIzZDkxNTMzOGJlOTU3YjcxZTA3OTU3ZDhh%0AYWQ2YjEGOwBU%0A--cf66a01faadf81e2cf2997a9e01c7dccdc5c67ba"  http://example.com

Does the page have a "Remember me" option. If so, you can export the cookie file (see this: http://blog.mithis.net/archives/python/90-firefox3-cookies-in-python) and use --load-cookies in wget.


It sounds like you want some kind of web automation tool rather a straight downloader like wget.

The one that comes to my mind is WatiN but there are many others like this.

Edit: Actually, Selenium is probably a better fit. If you're not a programmer, it has a simple point and click "macro" type mode in Firefox.

