I have a pair of servers hosting a single Magento ecommerce site with moderate traffic (60k page views per day reported from google analytics, I think about 80k reported on the server itself). The database server runs smoothly and quickly, aside from a rare occasional hiccough, but the apache server has been falling over every so often.
I have set up magento to use the recommended PHP caching (APC), as well as holding its own cache files in a 1.5 gig tmpfs (this tmpfs regularly gets pretty full, and I have a script running to clear cache files when the tmpfs is more than 80% full). I serve most imagery from amazon cloudfront. I recently set up nginx as a reverse proxy to apache (nginx also serves the static files). I have configured apache to the best of my ability - keepalives and hostnamelookups are off, and the prefork is configured as follows:
<IfModule prefork.c>
StartServers 50
MinSpareServers 50
MaxSpareServers 100
ServerLimit 512
MaxClients 256
MaxRequestsPerChild 400
</IfModule>
I've not turned off .htaccess files, and access logging is on. I know there are some modules I can turn off. I'm not sure what effect any of those three changes would have, if any.
The apache server is a VPS with 6 gig of RAM. As of the time of writing the server is reporting load average: 17.77, 18.27, 49.76
, but there's about 2 gig of RAM free. When it goes really bad, the load goes to 120+ and stays there - restarting apache brings the site back up and the load back down.
vmstat
is (while the server is reporting the load above), I think, showing a CPU idle value fluctuating between 0 and 70 or so. iostat
is showing an iowait value between 0 and 0.2%.
I'm a bit stuck. What little I know is telling me that the problem is that the CPU is overloaded as a result of combination of the code being run, and the number of users. But I'm not experienced enough to be certain that that is the problem. If that is the problem, I think the solutions are to either improve the code or to split the site hosting over two VPSes with a load balancer.
So, I guess my questions are:
- What else can I do to find problems or bottlenecks on the server?
- Are there any obvious changes I can make to the server config to improve this?
- Is it a good idea to set an automated system to restart apache when the load goes above a certain level?
- From the above, how likely is it that the site has outgrown the server?
Edit:
I found something weird - /var/spool/mail/root was large ... 38 gig. That sounds ... unhealthy. Could that be the problem?