0

We are trying to improve further the speed of some sites with older HTML in order as well to obtain better SEO results. We have now applied some minify measures, combined html, css etc. We use a small virtualized infrastructure and we've always wanted to use a light + standar http server configuration so the first one can serve images and static contents vs the other one php, rewrites, etc. We can easily do that now with a VM using the same files and conf of vhosts (bind mounts) on apache but with hardly any modules loaded. This means the light httpd will have smaller fingerprint that would allow us to serve more and quicker, have more minSpareServer running, etc.

So, as browsers benefit from loading static content from different hostnames as well, we've thought about building a rewrite rule on our main server (main.com) to "redirect" all images and css *.jpg, *.gif, *.css etc to the same at say cdn.main.com thus the browser being able to have more connections.

The question is, assuming we have a very complex rewrite ruleset already (we manually manipulate many old URLs for SEO) will it be worth?

I mean will the additional load of main's apache to have to redirect main.com/image.jpg (I understand we'll have to do a 301) to cdn.main.com/image.jpg + then cdn.main.com having to serve it, be larger than the gain we would be archiving on the browser?

Could the excess of 301s of all images on a page be penalized by google?

How do large companies work this out, does the original code already include images linked from the cdn with absolute paths?

EDIT Just to clarify, our concern is not to do so much with server performance or bandwith. We could obviously employ an external CDN server but we have plenty CPU and bandwith.

Our concern is with how to have "old" sites with plenty semi-static HTML content benefiting from splitting connections for images and static content via apache without having to change the html to absolute paths (ie. image.jpg to cdn.main.com/image.jpg happening on the server not the code)

luison
  • 273
  • 1
  • 7
  • 21

3 Answers3

1

to redirect main.com/image.jpg (I understand we'll have to do a 301) to cdn.main.com/image.jpg

This is a bad idea, don't do it.

Your long question is kind of hard to follow. As I understand it, you're worried about the server load, not how fast the site loads for your end users.

If you're worried about server capacity, either:

  • Get a bigger server or
  • Use a in-RAM proxy server in front of your HTTP server (Squid, Varnish, Apache Traffic Server) or
  • Use a inexpensive Content Delivery Network to cache your static content closer to the end users, reducing both page load time and load on your servers.

thus the browser being able to have more connections.

The browser parallel download limit is a little bit of a red herring -- modern browsers download 6-8 files in parallel from the same hostname. It's only old browsers (IE6, IE7) which really suffer from this limitation.

If you're worried about the end user's page load speed, then use a CDN, that's good advice in all situations were your users are spread out over a large geographical area (ie you have a global audience).

301s of all images on a page be penalized by google?

Possibly. But more likely, it will be hated by your users. For each image request you're first serving a 301, and then the correct image from another URL. That means 2 round trips to the servers for every single image, and hence significantly longer page load time for your users.

  • Thanks and sorry if my question is slightly confusing. I am NOT concerned either about server load or bandwith as we have plenty available and as you say server cost being so low nowadays. We are just trying to improve page load speed from old sites of user benefit and search engine crawling improvement. Our main concern /doubt is how to achieve that without changing all images on the code to URL absolute paths... ie image.jpg to cdn.main.com/image.jpg – luison Jan 18 '11 at 09:20
  • @luison: OK, that's clear. And the answer sadly is you can't, really. The good solution is to change the code so that all images are linked to with a full path that goes over a CDN or static file server, as you say. There are bad kludgy alternatives, like running *the entire site* through a CDN, or set the HTML BASE tag to a CDN URL and then changing the code for the navigation to full non-CDN paths. But these are nasty kludges, really, and will greatly complicate future maintainability. HTH. Since you worry about end users speed, certainly don't do an extra 301 response for every image. –  Jan 18 '11 at 09:56
0

I've been playing with a similar question at my company for the past weeks. A couple things to keep in mind:

  1. Multiple domains for browsers: This is a fine line to tune. On the one hand, you have browser-imposed limits of how many resources can be loaded concurrently from the same domain (in this respect, more domains to load from is better). On the other hand, each domain you call out to requires a DNS lookup, which can hinder load speeds. I've been using a rule-of-thumb of 5 different domains, myself.
  2. Next, if I'm reading your description right (and I'm still drinking coffee, so I might not be), you're still going to be making the initial request to 'main.com' and having the 'heavy' apache redirect to the 'light' apache. This isn't going to help you at all. If you went this route, it would be better to set up 'main.com' as the light apache and have it Reverse Proxy any non-static content to the 'heavy' apache.
  3. Number 2 doesn't take full advantage of the 'multiple domain' improvement though, since all requests are pushed through the domain main.com. So what we did was set up a domain (following your example names) cdn.main.com as the 'light' apache and have it internally redirect to the appropriate resources. You might check out mod_cache also.
Derek Downey
  • 3,765
  • 4
  • 25
  • 29
  • Thanks DTest, coffee indeed required for the question. Your point 2 makes senses but as you say yourselve does not work as I can never redirect the URL that actually indexed to another domain so it has to be the main one responding. Really the light or not httpd server is not so much the question as if its worth doing a *.gif or *.jpg redirect to cdn.main.com independently of how big is that server. Regarding DNS and number of server I've read about considering IP redirects and a rule of even less, something like a 1/15 ratio, 30 connections good with 2 CNAMEs – luison Jan 14 '11 at 16:17
0

Usually, geting another server with say 20TB of available bandwidth per month costs much less than the CDN will charge you for the same amount of traffic (okay, it is probably faster due to the spread of the CDN but it is not cheaper in any case)

Anyway, I am experimenting with the following setup (please comment on this if you think I may have something wrong here)

Have www.server.com run all the php/mysql stuff and serve all the static files (images, css etc) from static.server.com. (i have set all base url addresses on variables so I can change them on will) On the static.server.com there will be a rewrite rule to check if the image exists localy and, if not, bring it over from www.server.com through a shell script/scp/rsync command. Then, next time it will be required, it will be already there. So, no second rewrite rule will be needed for that specific image/file. You will always have the option to just change that one variable that holds the path to your static content and serve everything from your main server again. The images will be there too.

This way, image calls go directly to my static server and all the other stuff goes to the php/mysql one. If you don't run any php on the static server, then the processes won't become very big so you can have much more servers running in parallel to serve your images and it won't eat up your main computer's bandwidth.

If you see that much CPU goes to waste on your static server,you can always move your mysql there too so you can balance the loads between the two.

For an example on the above you can check this out (it is almost the same: )

http://mrphp.com.au/code/image-cache-using-phpthumb-and-modrewrite

Edit: I am talking about using two separate servers here (and physically located close enough in the case the mysql goes on the other server)


Another thing I should add on the above is that you can have a DNS entry on static.server.com point at whichever server you want. (even the same as the rest of the application). So you can hard code the static content's address to e.g. static.server.com/image1.jpg. Applying the above solution just requires to set the DNS entry to point the static.server.com domain to your cache machine.,

pataroulis
  • 143
  • 1
  • 2
  • 11
  • Thanks pataroulis. Again our issue is more how do large sites deal with linking CDN images. It just hard to believe that all the images on their templates/html code is linked with absolute paths (image.domain.com/image.jpg). Obviously if all comes from an application that generates images is easier. As for the server that is more or less what we plan but we have had very good experiences with "bind mounts" and virtual machines. This is, 2 VM mount the same path for virtual sites on apache. Contents is exactly the same, servers can have different configurations. – luison Jan 18 '11 at 09:27