web scraping using php and curl from behind corporate proxy / firewall

0

I am behind a corporate proxy / firewall. I want to extract info from another website and am trying to do so using php and curl.

My script is as follows:

===================== start of script ================

$url = "www.guptaed.com"; $proxy_ip = "12.34.56.78"; // ip changed from real company proxy $proxy_port = "81"; $proxy_user_pass = "my_user_name:my_password"; // user&pass changed

$ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1); curl_setopt($ch, CURLOPT_TIMEOUT_MS, 5000); curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP'); curl_setopt($ch, CURLOPT_PROXY, $proxy_ip); curl_setopt($ch, CURLOPT_PROXYPORT, $proxy_port); curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxy_user_pass);

$data = curl_exec($ch); curl_close($ch); echo $data;

===================== end of script ================

And the following is displayed on the screen when I call this script (via a locally installed apache server):

===================== start of output ================

Found

The document has moved here.

1

===================== end of output ================

"here" in the above sentence is a link with the url as: "http://www.guptaed.com/proxy.cgi?proxy.pac"

Any help will be appreciated.

Thanks! Ashish

guptaed

Posted 2016-09-06T01:18:51.373

Reputation: 1

Answers

0

The target URL returned a 302 HTTP response code, used to redirect you to another URL. That's why you get the "Found" result with another URL.

Try configuring curl to follow redirects:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

Atzmon

Posted 2016-09-06T01:18:51.373

Reputation: 2 639

Now, I get a screenfull of info. The output has stuff like the following: // Dynamic Automatic Proxy Config - PLEASE DO NOT MODIFY // Configuration Generated at Fri Sep 9 04:03:14 2016 UTC - proxy.pac : default // Client IP: 48.66.80.33 | BROWSER: | Region: default ftpProxyAll = "PROXY " + "48.64.218.100:8080" + "; PROXY " + "48.65.218.100:8080" ; gopherProxyAll = "PROXY " + "48.64.218.100:8080" + "; PROXY " + "48.65.218.100:8080" ; httpProxyAll = "PROXY " + "48.64.218.100:8080" + "; ..... Any suggestions? Thanks! – guptaed – 2016-09-09T04:04:57.053