1

I'm considering writing an application that aggregates information from a fairly popular website. This application would request information from this website at a set interval. I know this is a really hard question to even "ballpark" an answer to, but what might be a good safe interval to stay mostly "under the radar"? I'm a programmer first, a human being second, and a server admin a distant third, so my knowledge of what a server software like Apache can handle as far as server load with dynamic content is pretty basic.

I know this question is EXTREMELY open ended and the answer depends on many variables, but any related experiential knowledge being shared would be very much appreciated.

Pierre.Vriens
  • 1,159
  • 34
  • 15
  • 19
lotri
  • 143
  • 1
  • 4

3 Answers3

2

First, second and third, I would see if the site has an API. Fourth, I would see if the site has a Terms of Use Policy. Lastly, random numbers are your friend.

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
1

If it is measured in seconds and a high traffic site, it shouldn't be that much of an impact. More important than your second+ interval is probably to ensure you are accepting compressed responses etc properly.

Although if you're really trying to be polite you should ask them for permission or for a copy of the data you want.

sclarson
  • 3,624
  • 21
  • 20
1

My advice is to have a look at similar services. Services with open APIs usually publish their rate limits. For example Twitter.

While accessing an API is something different than what you are up to and this does certainly not guarantee that you "stay under the radar" but it might give you an idea.

Ludwig Weinzierl
  • 1,170
  • 1
  • 11
  • 22