Wget getting responce 403

0

I am using API that has some limit of requests in an hour. But my script does all at one time so I lose about 1/3 of requests because I get 403.

Is there any way to check response of wget and if I get 403 to wait 5 mins and retry?

And here is my (for now) test code:

system ("wget \"http://test-link.com/403/\" -O  {$dir}/{$in_dir_counter}.xml");
$test = system ("wget \"http://test-link.com/403/\" -O  {$dir}/{$in_dir_counter}.xml");

echo "responsed - ".$test;      

both returns same.

user270181

Posted 2013-11-07T10:41:41.547

Reputation: 1

What does your own research suggest? – Dave – 2013-11-07T10:48:24.377

My research? All forums I've read suggests to add timeout for EVERY request. But I can't do it because with such conditions (403) it takes 1-2 days to complete. So if I add like 10sec timeout it would be atleast 4-5 days in best wishes. – user270181 – 2013-11-07T10:52:09.323

It would be helpful if you posted your script or the relevant part of it – Tog – 2013-11-07T10:58:36.053

Just added part of code. Hope it helps. – user270181 – 2013-11-07T11:04:25.607

Answers

0

How about using a simple script for that:

  • Run the script once every 5 minutes unless it's running already.
  • Check the age of the local file. If it's older than a specific threshold, download it again.

So if everything wents smooth, nothing happens, unless a file is outdated. If a file is outdated and failed downloading, you can retry next Iteration.

I'm not sure why you tagged this with php, but if you're actually running a php script this approach is rather easy to do (given you've got web sockets enabled):

foreach($files as $file)
    if (@filemdate($local_path + $file) + $cache_duration < time())
        @copy($remote_path + $file, local_path + $file);

Note that $remote_path can indeed be a HTTP or FTP URL. There's no need to invoke wget. The @ will prevent error messages to be printed.

To prove that this won't cause unneeded waiting:

  • Assume you've got 1000 files to download, but you can only download up to 250 files per hour.
  • Set cache_duration to a save time where you'll get all files, like 24h (24 * 60 * 60).
  • Rerun the script above once every hour.
  • The first iteration the first 250 files will be updated. The others will fail.
  • The second iteration the first 250 files will be skipped (due to being recent enough) and the next 250 files will be downloaded.
  • After the fourth iteration you'll have all 1000 files updated/downloaded.
  • Of course you can set a shorter intervall, like 5 minutes, but this will create a lot more requests/traffic (depends on whether this is acceptable).

Alternative script idea:

  • Try to download a file.
  • If it fails, you should be able to determine that based on wget's return value/exit code. So in that case wait 5 minutes, then repeat.

Mario

Posted 2013-11-07T10:41:41.547

Reputation: 3 685

Like i said in comments - it already takes 1-2 days. If I add 5min timeout or start another one it won't help but make it worse. I need to get all request with minimal time waste. – user270181 – 2013-11-07T11:06:54.527

My first approach won't use any timeout. It will try to download all outdated files at once. 5 minutes later you retry, only downloading the files that failed the first attempt. There is only some waiting between attempts downloading everything. – Mario – 2013-11-07T11:08:20.447

as i said all script works 1-2days. It depends on how much info they give me. Sometimes its 15k, sometimes 45k. And I have 2 more scripts. All work in chronological order. So I can't waste time to check all files and retry it. I have to do it in the process. – user270181 – 2013-11-07T11:13:32.117

How about parallelizing it then? You could as well store some pointer or index to the latest file you could retrieve last attempt. Just get a bit creative here. This way you don't have to recheck all files every time; just reset the index once everything is done or so. – Mario – 2013-11-07T11:15:16.617

Right now i am thinking to check a file (where i write info). Because every request creates new file. If the file size is 0 (403) then sleep 5 sec and try again. What do you think about that? – user270181 – 2013-11-07T11:21:49.210

Check my code above. It will check the last time the local file has been modified. If the download fails (call to copy()) the file won't be touched/modified. What you're suggesting now would essentially my alternative idea above. Just keep in mind that you might get stuck in an endless loop if for some reason the file will never be able to be written. – Mario – 2013-11-07T11:23:40.090

So i need to add additional counter... Like 100 times to try – user270181 – 2013-11-07T11:26:44.357

Yes, that would be best. Just keep in mind that this will add up as well, e.g. waiting 100 times for 5 minutes. – Mario – 2013-11-07T11:27:49.343

I'll try. On my test site it helped. Thank you for help. – user270181 – 2013-11-07T13:46:30.657