0

I've had a few cases in which the homepage of one of my sites wouldn't load in the browser. The site is running on django/fastcgi/nginx.

It was difficult to reproduce so, in an attempt to understand the frequency of this problem, we've added a 1x1px images to the homepage body, served in the same way as the homepage html. Then, we wrote a script that scans the nginx logs and checks for each request for the homepage if a request for the 1px image occurred, from the same ip, within 10 seconds from the request for the homepage.

The results were shocking - about 30% (!!) of the homepage requests didn't have a request for the image in close proximity, and that's after filtering out requests from obvious searchbots etc. Many of such requests are even from high quality sources, i.e. visitors who are very likely to at the very least want to see the rendered homepage before leaving the site...

Therefore, I strongly suspect there is some sort of technical problem that's causing many requests to fail.

How should I go about troubleshooting this?

GJ.
  • 529
  • 1
  • 7
  • 21

2 Answers2

0

It's difficult to give specific advice without more details but here are some general comments that may be useful:

  • Try using a time longer 10 sec in your check. It may be your page sometimes takes longer than 10 sec to load/render which is causing some false positives. Though a longer than 10 sec page load is another issue you should address.
  • Try using a site bench marker like ApacheBench (installs with Apache) or Siege to see if you can replicate the issue. With ApacheBench, for example, I would be looking at the "Failed requests" and "Write errors" fields which should be 0 for a well behaved server/application. Try testing locally on the server and on a remote client as well as with different concurrency levels.
  • The previous step should also give you the approximate serving capacity of your system. Check to make sure your regular traffic is not approaching this level. If you can only serve 10 requests/sec anything past this will likely result in a dropped request or error.
  • Check the various logs for any obvious error or warning messages (nginx, database, application, system, etc...). Enable them if they are not being used. If you don't see any relevant messages try increasing the logging level temporarily for a few days.
  • Look into system monitoring with something like Zabbix or Nagios. There are many systems to choose from. See this question or this question for a few good examples. These won't tell you where your problem is (usually) but are invaluable in debugging issues and, once you've found the problem, letting you know when they do occur.
  • If you are sure there is a problem but can't find it try changing parameters and retesting. Try different pages that load or don't load different things. Try dynamic/static pages. Try lighttpd/Apache instead of nginx (for testing at least).
  • If you still can't find anything are make sure there is a problem to find. Your method of testing may indicate a different issue than you think it does (slow loading pages or clients that disable images for example).
uesp
  • 3,384
  • 1
  • 17
  • 16
0

I am not familiar with nginx, but this sounds like it could be a max connections issue.

A quick google search told me that "worker_connections" is used to set how many simultaneous connections a worker_process allows. You could always try doubling or tripling what ever the current number is.

Like I said, I am completely unfamiliar with nginx so I could be way off on this, but it's worth a shot.

AdamP
  • 11
  • 3