Problems with Wget to a CloudFlare hosted site: 503 Service Unavailable

4

3

I have seen other instances of 503 errors using Wget, but to no available I cannot solve this.

When I try to download a certain website, I get a 503 Service Unavailable error. This does not happen to any website except for the one in question.

This is what is happening. I enter:

wget -r --no-parent -U Mozilla http://www.teamspeak.com/

And this is the error I get back.:

--2015-03-12 11:57:08--  http://www.teamspeak.com/
Resolving www.teamspeak.com... 104.28.27.53, 104.28.26.53
Connecting to www.teamspeak.com|104.28.27.53|:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2015-03-12 11:57:09 ERROR 503: Service Unavailable.

This site does use CloudFlare protection (when opening the site you have to wait 5 seconds while it “checks your browser.”

Zac Webb

Posted 2015-03-11T23:05:29.403

Reputation: 141

Answers

3

CloudFlare protection is based on JavaScript, cookies and http header filtering. If you want to crawl CloudFlare protected site using wget, you first have to enter it in a browser with debugger (eg. Firefox with Firebug), and copy Cookie request header.

Now the hardest part: this cookie is valid for 1 hour only, so you will have to refresh it manually each hour.

Here is the complete command you can use to crawl the site:

wget -U "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0" --header="Accept: text/html" --header="Cookie: __cfduid=xpzezr54v5qnaoet5v2dx1ias5xx8m4faj7d5mfg4og; cf_clearance=0n01f6dkcd31en6v4b234a6d1jhoaqgxa7lklwbj-1438079290-3600" -np -r http://www.teamspeak.com/

Note that __cfduid cookie value is constant, and you only have to change cf_clearance cookie value each hour.

Tomasz Klim

Posted 2015-03-11T23:05:29.403

Reputation: 782

I was using python request library to do this instead of wget and this worked for me. Note that first you have to open the page in a proper browser, and then when you take the cookie data, you also have to ensure your script is using the same user agent. – Deep-B – 2016-05-05T20:41:29.063

1

Sounds like CloudFlare might have block your request to the site because you we're accessing the site though a command line. Since they have “I’m under attack” mode enabled on their account, you can only access the site with a normal web browser.

Marc Woodyard

Posted 2015-03-11T23:05:29.403

Reputation: 31

0

The issue seems to be that TeamSpeak is using CloudFlare’s DDoS protection in place. See the screenshot at the bottom of the answer. More details on what this protection is/means on this official Amazon page on CloudFlare’s security features:

CloudFlare leverages the knowledge of a diverse community of websites to power a new type of security service. Online threats range from nuisances like comment spam and excessive bot crawling to malicious attacks like SQL injection and denial of service (DOS) attacks. CloudFlare provides security protection against all of these types of threats and more to keep your website safe.

More specifics on their advanced DDoS protection methods can be found here:

CloudFlare's advanced DDoS protection, provisioned as a service at the network edge, matches the sophistication and scale of such threats, and can be used to mitigate DDoS attacks of all forms and sizes including those that target the UDP and ICMP protocols, as well as SYN/ACK, DNS amplification and Layer 7 attacks. This document explains the anatomy of each attack method and how the CloudFlare network is designed to protect your web presence from such threats.

Now how does this factor into the “503 Service Temporarily Unavailable” you are seeing? Well, that means that the site you are trying to access is under such a high level of protection from the Amazon CloudFlare DDoS detection/mitigation services that non-standard access via a command line tool like wget or curl is just not possible at this point.

FWIW, I have done a few different curl attempts from the command line and I believe what happens is that CloudFlare’s DDoS protection just acts like a huge web page proxy for sites that opt to use it. And the “real” website exists somewhere other than the IP address the hostname resolves to. Sites like this claim to give you the “real” IP address connected to a CloudFlare hostname, but it doesn’t seem to work at all. Or maybe the IP address that is given is valid, but the way the service is setup just denies you direct access to the real site without jumping through CloudFlare’s loops.

Which simply means, the best you can do is sit and wait and maybe in a few hours or possibly days the security issues that site faced will fade away and standard wget or curl calls can be made. But the reality is if this security protection is in place, and is solid, and the website owner does not disable it, then you can’t do much to get around it.

enter image description here

JakeGould

Posted 2015-03-11T23:05:29.403

Reputation: 38 217