Accessing subpage of website by IP

2

I'm writing a php application that uses CURL to scrape data off websites. But the time it takes to load a website with curl is very, very slow. A lot slower than the time it takes to load it in Chrome even though Chrome loads a lot of other things like spreadsheets and images while the php application does not.

Anyway, I read that curl might have problems with DNS lookup so accessing by IP could be a lot faster.

But I'm not sure how to do that.

Let's take Google for an example. I can open my commands prompt and do "ping www.google.com". It answers with:

Pinging www.google.com [74.125.232.114] with 32 bytes of data...

So I can use that IP address then, which works, but what if I'd like to access for instance: www.google.com/doodles

If I try entering that address when pinging it says it couldn't find the host, and doing http://74.125.232.114/doodles does not work either.

(Error: Not Found The requested URL /doodles was not found on this server.)

So how do I access that by IP?

Clox

Posted 2014-06-11T13:48:58.687

Reputation: 119

Answers

2

You are trying to access VirtualHost based websites by IP. The problem with that is that curl doesn't the hostname it's trying to access to the webserver, so the webserver doesn't know which page to serve (google.com might also host gmail.com, but it doesn't know what to give you because curl doesn't ask).

To let curl use a hostname, you could modify your /etc/hosts file with the following information:

74.125.232.114 google.com

(On Windows you can find this file in C:\Windows\System32\Drivers\etc\hosts)

If you let curl do a request to example.com, your OS will find example.com in the /etc/hosts file and not even try a DNS lookup, which would be much faster.


That being said, it would be much better if you fix your DNS settings. Have you tried modifying the /etc/resolv.conf file with the nameservers of your provider (or Google Public DNS)

nameserver 8.8.8.8
nameserver 8.8.4.4

mtak

Posted 2014-06-11T13:48:58.687

Reputation: 11 805

1

If the DNS response time is that large you should fix the DNS settings in your network. Have a look at /etc/resolv.conf and check if the nameserver(s) listed there are still available. If not, add a working DNS server (on top). You could use google's DNS servive for example:

nameserver 8.8.8.8

If you need, for any reason, the slow DNS servers, this could be because your application is using internal DNS names which are not available in the internet, then you can still modify your /etc/hosts file and add the hostname for 74.125.232.114 there:

74.125.232.114 www.google.com

Having common settings in /etc/nsswitch.conf, the system would use the /etc/hosts before performing a DNS request.

hek2mgl

Posted 2014-06-11T13:48:58.687

Reputation: 703

0

use -L to go with the redirect (as curl www.google.com says the page has been moved),

and it has been mentioned that when doing it via IP the Host header doesn't get filled out.

Well then, how about specifying the host header.

curl -L -H "Host: www.google.com" 173.194.34.115/doodles

barlop

Posted 2014-06-11T13:48:58.687

Reputation: 18 677