2

An Apache webserver running a mod_perl application is exposing abnormal memory usage - after the "day load" ceases, the system's memory is being exhausted by the Apache processes and oom_killer is being invoked. As the load returns the following morning, the memory usage normalizes - probably because Apache workers get recycled periodically if a sufficient number of hits is generated:

system memory stats graph

This is the graph for apache hits per second to correlate: apache hits graph

The remaining 2 hits per second throughout the night are induced by HAProxy checks - it runs HEAD http://mydomain.example.com/running HTTP/1.0 requests against the server every half a second with "running" being a static file (i.e. not invoking any perl code). It also seems that disabling these checks remedies the memory usage problem, but obviously cannot be a solution.

All of 3 similarly configured servers (behind HAProxy) expose this behavior. The running OS is Ubuntu 10.10, Apache version 2.2.16. This seems to be a memory leak but I have no idea how to start debugging it - any hints?

the-wabbit
  • 40,319
  • 13
  • 105
  • 169

1 Answers1

2

This is because memory leak in the perl code. The morning recycling might be done on the occasion of logrotate. What is best solution for this is to use fast-cgi instead of mod_perl, which is using e.g. 30 workers, so each is eating memory, so if each would recycle very often like 100 requests and not 1000 or 10000, this will eat memory anyway. So if you setup e.g. 8, 16, 32 fast-cgi workers (depends on your RAM), and recycle them every 100 page renders, this is not going to eat this much RAM, and the performance will not be degraded. Also, you will get more security if you can split modules between different fast-cgi workers running on different users, and ideally you setup also SELinux to isolate these accounts.

Andrew Smith
  • 1,123
  • 13
  • 23
  • You might also trace some dodgy recursion in the code of "running", which seems to be the case. You have some other bug in there if it's some simple website. But if it's some complex dodgy stuff the best is to switch to fast-cgi. – Andrew Smith Jun 21 '12 at 09:15
  • Thanks for your response, but "running" does not have any code. It is an empty static file on the web server's filesystem which absence we use to mark servers nonoperational for maintenance purposes. Also, I am not sure whether the memory leak is really in the perl code as repeatedly querying of the static file is what triggers the behavior. I also would prefer not "solving" the issue by recycling more often, if meaningfully feasible. You hit the bull's eye with logrotate, though - it is run through cron.daily at 6:25 which is 7:25 in the graphs due to time zone offset. – the-wabbit Jun 21 '12 at 09:37
  • Switching to FastCGI would require some major testing to make sure it does not break the application anywhere. Also [benchmarks indicate](http://www.chamas.com/bench/) that FastCGI invocation would result in a performance hit of about 50%. I would need to have something definitely pointing to the perl code (and preferably also the structure that is leaking) to justify such a major change. On the other hand, if I *have* something pointing at the leaking structure, I can pass it on to the devs for a fix. – the-wabbit Jun 21 '12 at 09:48
  • Hi. You can test fast-cgi with more workers so it might scale better. Fast-cgi is not that much slower especially if you have HAProxy there this should be fine. – Andrew Smith Jun 21 '12 at 11:55