0

We are a price comparison portal and crawl certain websites on a regular basis. I am looking for a method to save bandwidth by avoiding downloading the same page over and over again if the content did not change and instead make only a head request. Since the websites we care about are dynamic by nature the Last-Modified field or Content-Length are not a reliable indicator to whether a certain web page has changed or not. The response Etags field or even better the response MD5 would probably work fine. However most of the servers do not generate content MD5 for each request probably because it implies some CPU overhead for each response and would slow them down instead of saving some bandwidth and thus making them faster.

My question, is there a universally accepted method to create a HTTP request that will prompt the server to generate the Etags or the MD5 header for the response?

Radu M.
  • 125
  • 6

2 Answers2

1

There is no way for that. You cannot ask the server to return anything extra unless the web site provides a special API for that.

Roman
  • 121
  • 1
0

http head request?

Its return the same as GET but without body.

Korjavin Ivan
  • 2,230
  • 2
  • 25
  • 39
  • My point exactly, how to make the server return content md5 in the response to the head request? Please read the question .. – Radu M. Oct 17 '11 at 06:04