wget download file behind form

0

I try to download my LaTeX project periodically from sharelatex.com to archive it on my own. Unfortunately, I cannot pass the form. This is my try, what am I doing wrong?

#!/bin/bash
# Log in to the server.  This can be done only once.                   
wget --save-cookies cookies.txt \
     --keep-session-cookies \
     --post-data='email=my.lo@in.com&password=myFancyPw' \
     --delete-after \
     --auth-no-challenge \
     https://www.sharelatex.com/project/SOME_PROJECT_NUMBER/download/zip
echo "------------------------------------------------------------------------------------------------------------------------------"
# Now grab the page or pages we care about.
wget --load-cookies cookies.txt \
     https://www.sharelatex.com/project/SOME_PROJECT_NUMBER/download/zip

edit:

The cookie is stored and the downloaded file zip.* just contains the HTML login page.

If I add the --verbose and --debug flag to the wget commands, the output looks like this:

Setting --auth-no-challenge (authnochallenge) to 1
Setting --method (method) to POST
Setting --body-data (bodydata) to email=my.lo@in.com&password=myFancyPw
DEBUG output created by Wget 1.17.1 on linux-gnu.

Reading HSTS entries from /home/USER/.wget-hsts
URI encoding = ‘UTF-8’
--2017-07-13 07:18:23--  https://www.sharelatex.com/project/SOME_PROJECT_NUMBER/download/zip
Resolving www.sharelatex.com (www.sharelatex.com)... 45.79.151.246
Caching www.sharelatex.com => 45.79.151.246
Connecting to www.sharelatex.com (www.sharelatex.com)|45.79.151.246|:443... connected.
Created socket 3.
Releasing 0x00005640d5b63b30 (new refcount 1).
Initiating SSL handshake.
Handshake successful; connected socket 3 to SSL handle 0x00005640d5b649e0
certificate:
  subject: CN=*.sharelatex.com,OU=PositiveSSL Wildcard,OU=Domain Control Validated
  issuer:  CN=COMODO RSA Domain Validation Secure Server CA,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB
X509 certificate successfully verified and matches host www.sharelatex.com

---request begin---
POST /project/SOME_PROJECT_NUMBER/download/zip HTTP/1.1
User-Agent: Wget/1.17.1 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: www.sharelatex.com
Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 47

---request end---
[BODY data: email=my.lo@in.com&password=myFancyPw]
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 403 Forbidden
Server: nginx
Date: Thu, 13 Jul 2017 05:18:24 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 9
Connection: keep-alive
X-Powered-By: Express
Vary: X-HTTP-Method-Override
ETag: W/"9-cilpV3qWyjlT6E49lJ3ugQ"
set-cookie: sharelatex_session=s%3A5y54WSx5DWpS2xwlGIQgQrZswlQgkYbu.u3gjqNtKhK%2BTQIrrG15QaWQHsEDNc%2BSI6sgOi%2BPpwsY; Domain=.sharelatex.com; Path=/; Expires=Tue, 18 Jul 2017 05:18:24 GMT; HttpOnly; Secure
X-Server-Group: green
Set-Cookie: SERVERID=sl-lin-prod-web-3; path=/

---response end---
403 Forbidden
cdm: 2 3 4 5 6 7 8
Stored cookie sharelatex.com -1 (ANY) / <permanent> <secure> [expiry 2017-07-18 07:18:24] sharelatex_session s%3A5y54WSx5DWpS2xwlGIQgQrZswlQgkYbu.u3gjqNtKhK%2BTQIrrG15QaWQHsEDNc%2BSI6sgOi%2BPpwsY

Stored cookie www.sharelatex.com -1 (ANY) / <session> <insecure> [expiry none] SERVERID sl-lin-prod-web-3
Registered socket 3 for persistent reuse.
URI content encoding = ‘utf-8’
Skipping 9 bytes of body: [Forbidden] done.
2017-07-13 07:18:24 ERROR 403: Forbidden.

Saving cookies to cookies.txt.
Done saving cookies.
Saving HSTS entries to /home/USER/.wget-hsts
------------------------------------------------------------------------------------------------------------------------------
--2017-07-13 07:18:24--  https://www.sharelatex.com/project/SOME_PROJECT_NUMBER/download/zip
Resolving www.sharelatex.com (www.sharelatex.com)... 45.79.151.246
Connecting to www.sharelatex.com (www.sharelatex.com)|45.79.151.246|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /restricted?from=%2Fproject%SOME_PROJECT_NUMBER%2Fdownload%2Fzip [following]
--2017-07-13 07:18:24--  https://www.sharelatex.com/restricted?from=%2Fproject%SOME_PROJECT_NUMBER%2Fdownload%2Fzip
Reusing existing connection to www.sharelatex.com:443.
HTTP request sent, awaiting response... 302 Found
Location: /login [following]
--2017-07-13 07:18:24--  https://www.sharelatex.com/login
Reusing existing connection to www.sharelatex.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 15737 (15K) [text/html]
Saving to: ‘zip.15’

zip.15                             100%[==============================================================>]  15.37K  --.-KB/s    in 0s      

2017-07-13 07:18:24 (362 MB/s) - ‘zip.15’ saved [15737/15737

Tik0

Posted 2017-07-12T20:11:50.697

Reputation: 195

At which point does which error occur? Does the cookies.txt have content? Did you try with --auth-no-challenge? – Jaleks – 2017-07-12T21:28:13.420

I tried adding it to the first command but no success. I've also added the debug output to the question. – Tik0 – 2017-07-13T05:27:43.613

As you can see you're getting a 403 for your login attempt. So it's likely that you're not successfully logging in. You probably could check this by requesting another "restricted" site (maybe your profile options or something?) which would likely also result in the login page. For the actual download you can see that you're getting a 302 which redirects you to the login page. Did you check whenever your actual POST parameters are sufficient? – Seth – 2017-07-13T06:06:13.050

I've tried various sites. I got the POST parameters from inspecting the login page. username: <input class="form-control ng-pristine ng-isolate-scope ng-empty ng-valid-email ng-invalid ng-invalid-required ng-touched" type="email" name="email" required="" placeholder="email@example.com" ng-model="email" ng-model-options="{ updateOn: 'blur' }" ng-init="email = undefined" focus="true"> and password: <input class="form-control ng-not-empty ng-dirty ng-valid-parse ng-valid ng-valid-required ng-touched" type="password" name="password" required="" placeholder="********" ng-model="password"> – Tik0 – 2017-07-13T08:40:36.173

1Often the hidden fields are used too for the login, perhaps you have more success if you add the value of the _csrf hidden form input to your POST. (And probably this will be a value which changes on each refresh of the site) – Jaleks – 2017-07-13T17:47:28.523

@Jaleks Your hint got me on the right path. Found a solution, not with wget but a simple python script. I'll post this as an answer. – Tik0 – 2017-07-13T21:52:11.060

Answers

0

I found a solution using python and requests. I naively followed this tutorial and the login and following downloads worked like a charm: Logging in With Requests

Tik0

Posted 2017-07-12T20:11:50.697

Reputation: 195