How to get past the login page with wget?

7

4

I am trying to use Wget to download my private GitHub pages, but I cannot get past the login screen.

How do I send the login/password using post data on the login page and then download the actual page as an authenticated user?

Here is the command I am trying to run with the output:

 wget --save-cookies cookies.txt \
 --post-data 'login=myUserName&password=myPassword' \
 https://github.com/login

Wget output:

Resolving github.com... 207.97.227.239
Connecting to github.com|207.97.227.239|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-11-23 19:58:13 ERROR 403: Forbidden.

I tried the following command too:

 wget --save-cookies cookies.txt \
 --post-data 'authenticity_token=sPV07gM2/OHYDAT99WmawItd8R7hiTaJnBAs/b3zN9Y=&login=myUserName&password=myPassword' \
 https://github.com/login

Here is the form HTML code of the login page https://github.com/login,

<form accept-charset="UTF-8" action="/session" method="post"><div style="margin:0;padding:0;display:inline"><input name="authenticity_token" type="hidden" value="sPV07gM2/OHYDAT99WmawItd8R7hiTaJnBAs/b3zN9Y=" /></div> 
    <h1>Sign in <a href="https://github.com/plans">(Pricing and Signup)</a> </h1>
    <div class="formbody">

        <label for="login_field">
            Username or Email<br />
            <input autocapitalize="off" autofocus="autofocus" class="text" id="login_field" name="login" style="width: 21em;" tabindex="1" type="text" />
        </label>

        <label for="password">
            Password <a href="/sessions/forgot_password">(forgot password)</a>
            <br />
            <input autocomplete="disabled" class="text" id="password" name="password" style="width: 21em;" tabindex="2" type="password" />
        </label>

        <label class='submit_btn'>
            <input name="commit" tabindex="3" type="submit" value="Sign in" />
        </label>
    </div>
</form>

Lorraine Bernard

Posted 2012-11-23T19:11:08.837

Reputation: 183

Answers

9

You're making several mistakes by not doing what your browser would do:

  • You need to send the POST request with login credentials to the form action, i.e. https://github.com/session.
  • You need to provide all form parameters, including the percent-encoded hidden form parameter authenticity_token.
  • You need to provide the session cookies set by /login.

The only thing not required I'd have expected is setting the referer.


What you need to do:

$ wget --keep-session-cookies --save-cookies cookies.txt -O login.rsp https://github.com/login
$ grep authenticity_token login.rsp

This will request the login page, store the session, and print the CSRF token hidden form value (plus some surrounding HTML).

Now login after percent-encoding all parameters, especially the value of the hidden form parameter authenticity_token which often contains punctuation:

 $ wget --load-cookies cookies.txt --keep-session-cookies --save-cookies cookies.txt --post-data='login=USERNAME&password=PASSWORD&authenticity_token=TOKEN_VALUE_PRINTED_BY_GREP_THEN_PERCENT_ENCODED' https://github.com/session

You'll get bounced around a bit, and will end up on https://github.com, just like when logging in in the browser.

Daniel Beck

Posted 2012-11-23T19:11:08.837

Reputation: 98 421

2this doesn't work anymore getting "ERROR 422: Unprocessable Entity." – BBJ3 – 2015-06-02T12:01:56.990

@LucaG.Soave It's to be expected that the specifics of a non-API like this change after more than 2.5 years. But the basic advice should still hold -- do what your browser would do. – Daniel Beck – 2015-06-02T14:32:08.443

wget looks to have problems sending POST request needed when having to manage CSRF crumb tokens. Where curl has no problem. (2019) Or maybe I'm also missing some fields... – Sandburg – 2019-10-18T15:54:50.067