6

WHAT I'M TRYING TO DO

The server resource limits sometimes run tight; to help prevent memory exhaustion, I've had to limit server processes. I'm needing a little expert help to know if I'm on the right track, and perhaps spot any obvious settings changes that would help the system run more with more stability.

HISTORY

Recently my company upgraded to a VPS, from shared hosting. Basically we outgrew our shared hosting, and began to have problems due to the host suspending our site because of excessive CPU usage on the weekend. Our website users tend to double or triple on Friday and Saturday, every week, which is not unexpected in our case. (About 5000 visits [~2500 visitors] per day during the week, about 9500 visits [~4500 visitors] on weekends.)

Now that we are on a VPS, we have no CPU problems. (In fact, the CentOS WHM control panel says we are at ".000201% CPU load".) However, we are having out-of-memory problems, leading to crashes.

SUMMARY OF ISSUE

Our website is WordPress based. However, aside from comments, there is very little "write" activity; mostly users are simply seeing fairly static pages that we've created.

When we first upgraded to a VPS several months ago, in October 2012, the website ran well during the week, but choked on memory every weekend. Often it would crash repeatedly (5-20 times during a 24-hour period, sporadically), usually starting Friday evening, and continuing through Saturday afternoon.

During the week, the server ran consistently at 65-90% memory usage, and on the weekend it would hit 100%, causing crashes.

STEPS TAKEN TO CORRECT IT

Since I was new to VPS, I started with all the default settings. I later started tweaking, following advice I read about solving memory issues here on this site and other websites.

I've made adjustments to MySQL, PHP, and Apache, summarized below in "Current Configuration". I also recompiled Apache and PHP to remove unwanted modules. I installed a better caching plugin for WordPress (W3T), and added APC opcode caching. I also started using gz compression, and moved a lot of static files to a separate subdomain.

I wrote a nifty little script to check the server status on a schedule, and restart it as needed, and it also sends me a transcript of the server error log, to help troubleshoot. (I know, it's just a band-aid, if that. But it was important to keep the website online, since no one wants to sit around and monitor it on the weekend.)

Just recently, a week or so ago (January 2013), I upgraded the server RAM from 1 GB (2 GB burstable) to 2 GB (3 GB burstable). This seems to have fixed the majority of the problem, but still I get an occasional notice (once a week or so) that the server is hanging, along with "can't apply process slot" PHP errors.

CURRENT CONFIGURATION

It's an Apache server, running CentOS 6, Apache 2 (Worker MPM), PHP 5.3.20 (FastCGI/fcgi), and MySQL 5.5.28. 2 GB RAM (3 GB burstable), 24 CPUs.

MySQL currently uses about 618 MB, about 20.1% of RAM. PHP uses up to 89 MB per process. Apache uses up to 14 MB per process.

Typical weekday top output:

top - 15:31:13 up 89 days,  5:26,  1 user,  load average: 1.54, 1.00, 0.70
Tasks:  49 total,   1 running,  48 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.1%sy,  0.0%ni, 99.7%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3145728k total,  1046444k used,  2099284k free,        0k buffers
Swap:        0k total,        0k used,        0k free,        0k cached

Unfortunately I do not have a current example of weekend/busiest time top output.

Apache config:

StartServers: 5
MinSpareThreads: 5
MaxSpareServers: 10
ServerLimit: 80
MaxClients: 56
MaxRequestsPerChild: 5000
KeepAlive: Off

PHP config:

MaxRequestsPerProcess 500
FcgidMaxProcesses 15
FcgidMinProcessesPerClass 0
FcgidMaxProcessesPerClass 8
FcgidIdleTimeout 30
FcgidIdleScanInterval 15
FcgidProcessLifeTime 60
FcgidIOTimeout 300
FcgidMaxRequestLen 268435456

MySQL config:

[mysqld]
max_user_connections            = 75
net_buffer_length               = 8K
read_buffer_size                = 256K
read_rnd_buffer_size            = 512K
skip-external-locking
sort_buffer_size                = 512K

# MyISAM #
key_buffer_size                 = 32M
myisam_sort_buffer_size         = 16M
#myisam_recover                 = FORCE,BACKUP

# SAFETY #
max_allowed_packet              = 8M
#max_connect_errors             = 1000000

# CACHES AND LIMITS #
tmp_table_size                  = 104M
max_heap_table_size             = 104M
join_buffer_size                = 208K
#query_cache_type               = 0
query_cache_size                = 32M
max_connections                 = 150
thread_cache_size               = 4
#open_files_limit               = 65535
table_cache                     = 512
#table_definition_cache         = 1024
table_open_cache                = 2048
wait_timeout                    = 300

# INNODB #
#innodb_flush_method            = O_DIRECT
#innodb_log_files_in_group      = 2
#innodb_log_file_size           = 64M
#innodb_flush_log_at_trx_commit = 1
#innodb_file_per_table          = 1
innodb_buffer_pool_size         = 416M

# This setting ensures that aio limits are not exceeded
# (default is 65536, each instance of mysql takes 2661 with this enabled)
innodb_use_native_aio           = 0

# LOGGING #
log-slow-queries
log-queries-not-using-indexes

Any help/suggestions would be much appreciated. The website address is 3abn.org.

Michael
  • 81
  • 1
  • 9
  • 2
    And don't use OpenVZ. You don't really have 3GB of RAM available to you all the time; far less than that most likely. Post the content of `/proc/user_beancounters`. – Michael Hampton Jan 15 '13 at 22:20
  • Can you state more precisely why you think the crashing issues have something to do with RAM usage? It seems like you jumped to that conclusion and none of the evidence you presented supports it. – David Schwartz Jan 16 '13 at 02:15
  • @MichaelHampton, the /proc/user_beancounters is 0 bytes. – Michael Jan 16 '13 at 22:41
  • @DavidSchwartz, sorry, I should have explained that more. The reason that RAM usage is "suspected" is because I would log in to WHM, or use `top`, and see that we were at 99-100% usage. This would be when the server would crash. Besides that, we were getting "unable to init Zlib: deflateInit2 returned -4" errors, which is an out of memory error. Also, all the httpd (Apache) processes would die (get killed), which is something that happens when memory runs out. – Michael Jan 16 '13 at 22:45
  • Also, I should note that 100% RAM usage represents all the dedicated memory PLUS all the burstable memory, which should never have been happening anyway. The VPS host has policies against overuse of the burstable memory. – Michael Jan 16 '13 at 22:49

3 Answers3

3

You're already running PHP with FastCGI, so I'm not sure what else you can do to slim this down. You're kind of at a crossroad here..

A couple of options:

  • Tune everything down to the smallest possible dataset. Replace Apache with nginx (if you can), tune MySQL so that it's not buffering more data than needed and so on
  • Throw more RAM at the box
  • Separate your tiers into dedicated VM's. One database server, one application server and one front-end. This will make it alot easier to scale.

Edit: You're saying that you have installed lots of caching stuff. Cache = eat more RAM so that the next request(s) will go faster. If you're very low on RAM - caching might not be the best thing in the world..

pauska
  • 19,532
  • 4
  • 55
  • 75
  • Yeah, nginx seems to be where it's at, or lighttpd. Since I'm new at this, I guess I'm a little nervous changing the web server, not knowing what gotchas there might be. But this gives me extra incentive to research it. – Michael Jan 16 '13 at 23:04
  • You mentioned less buffering and less caching. I'd been led to believe (or read into other advice, I'm not sure) that these would help indirectly by speeding up the time it takes to serve requests, so less RAM would be used concurrently. But you're saying that's not so? – Michael Jan 16 '13 at 23:07
  • @Michael You are correct. The more you manage to cache the less load your servers get. Your only problem is that cache uses RAM, and you don't have a lot of it. – pauska Jan 17 '13 at 10:11
3

My number one recommendation: GET OFF THE VPS.

I have heard enough grousing about memory (and OOM-Killer) related problems on VPS systems that I'm of the opinion the typical VPS hosting provider is not providing a Production Grade solution -- it's not a "virtual private server", it's "paravirtualization on top of an existing OS with poorly designed resource limiting that consequently behaves differently than a real machine would".
(You appear to be being bitten by the most common difference: a "VPS" has no swap space, so when you chew up even one byte more RAM than you were allocated by your provider things fall apart.)

If you are unable or unwilling to host on your own hardware colocated at a quality datacenter you should consider a cloud service that appears like a "normal" server with swap and the like (Amazon EC2 is one such option). These solutions are priced somewhere between "VPS" solutions and dedicated hardware, but provide an operational experience much closer to "real hardware", and let you avoid situations like you're in now.


Note that in any case you still need to size your system adequately -- your VPS/Cloud Solution/Dedicated Hardware should have enough RAM to handle peak load without swapping.
The advantage of (quality) Cloud or Dedicated Hardware solutions is you have more control over what happens when you reach the swap point (disabling the OOM-killer and letting malloc() fail, for example).

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • 2
    Only container-based VPSes like OpenVZ lack swap space. Xen, KVM, VMware, etc., all allow for it, and can actually be used to construct the sort of quality environment you're talking about. OpenVZ really cannot. – Michael Hampton Jan 15 '13 at 22:40
  • @MichaelHampton The preponderance of "VPS" questions here have been container-based VPS questions - I've no idea what the actual "VPS Landscape" looks like these days, but I'm soured on it by the endless litany of bad experiences that have been related on Server Fault. It's entirely possible I'm being unfair to the *good* VPS providers out there, but *POISONED FOREVER!* :-( – voretaq7 Jan 15 '13 at 23:05
  • 5
    Having used a wide variety of VPS virtual machines with a wide variety of technologies over the past several years, I can tell you with certainty that 90+% of the crappy ones are OpenVZ. – Michael Hampton Jan 15 '13 at 23:08
  • I'm not sure which VPS platform my host uses. I'll ask. Thanks for the cloud server advice; I will need to check that out. – Michael Jan 16 '13 at 23:39
2

From the info you posted:

. Your VPS seems to be running under OpenVZ or Parallels Virtuozzo. If the hosting provider overallocate (a lot do), your server will never be able to use the third gig of memory.

Worst, your VPS may be allowed to burst for a small period, but then the OOM killer would start killing processes. The OOM killer can be tweaked (from the hosting provider side) to try to prevent the more important processes from being killed (let's say ssh, bind, apache, mysql - using priorities), but as the other clients on the same node might run the same typical setup, it is not help.

Get 3G garanteed, burst disabled.

. If the webpages are really static and small, you could use a caching reverse proxy. Yes, caching uses memory. But caching also prevent spawning PHP processes, which tend to be very demanding.

(you need to do some maths and/or try... This is not an absolute solution)

. Disable APC, or run PHP-FPM.

With Fcgi, each PHP process has its own APC opcode cache. Use FPM for all PHP processes to share a common cache. Or disable APC. You don't seem to need it anyway (an opcode cache is much more efficient for reducing CPU usage than memory usage ;))

zecrazytux
  • 639
  • 3
  • 7