4
1
To check HTTP response header for a set of urls I send with curl the following request headers
foreach ( $urls as $url )
{
// Setup headers - I used the same headers from Firefox version 2.0.0.6
$header[ ] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[ ] = "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[ ] = "Cache-Control: max-age=0";
$header[ ] = "Connection: keep-alive";
$header[ ] = "Keep-Alive: 300";
$header[ ] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[ ] = "Accept-Language: en-us,en;q=0.5";
$header[ ] = "Pragma: "; // browsers keep this blank.
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
curl_setopt( $ch, CURLOPT_HTTPHEADER, $header);
curl_setopt( $ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt( $ch, CURLOPT_HEADER, true );
curl_setopt( $ch, CURLOPT_NOBODY, true );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY );
curl_setopt( $ch, CURLOPT_TIMEOUT, 10 ); //timeout 10 seconds
}
Sometimes I receive 200 OK which is good other time 301, 302, 307 which I consider good as well, but other times I receive weird status as 406, 500, 504 which should identify an invalid url but when I open it on the browser they are fine
for example the script returns
http://www.awe.co.uk/ => HTTP/1.1 406 Not Acceptable
and wget returns
wget http://www.awe.co.uk/
--2011-06-23 15:26:26-- http://www.awe.co.uk/
Resolving www.awe.co.uk... 77.73.123.140
Connecting to www.awe.co.uk|77.73.123.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Does anyone know which request header I am missing or adding in excess?
As the header is mandatory, curl includes it automatically (and if it didn't, the response would be 400 Bad request). – user1686 – 2011-06-23T17:33:53.203
1Not true in practice. It only sometimes invents one if one hasn't provided one, depending from what is passed to other options and to
curl_init()
, which we haven't been told. And, as should be obvious from the data in the question even if one has never encountered it in practice, not everyone gets the error responses for incorrect protocol right. – JdeBP – 2011-06-24T09:20:04.607