1

I have a Spree site running the following stack:

  • Nginx 1.0.8
  • Passenger 3.0.9
  • Ruby 1.9.2-p290
  • Rack 1.3.6
  • Rails 3.1.4
  • Spree 0.70.5

I recently upgraded from Spree 0.70.3, which also brought a Deface upgrade from 0.7.x to 0.8.0. Since then things have been very unstable.

Recently we've seen some CPU-hogging processes which drive load up on the server and grind the whole thing to a stop. They're Rack processes and it looks like Passenger is starting them; they're owned by the site-runner user, an unprivileged user who owns the application code. (Passenger automatically runs the site code as the user who owns it.) If I restart Nginx and kill the runaway processes, it helps for a while, but eventually similar processes return and bog things down again.

ETA: I'm looking now at passenger-status and passenger-memory-stats which suggest these are Passenger's application processes. If it's running away or hanging, there must be an issue with my app.

What's my best option for figuring out where this is hanging?

pjmorse
  • 1,450
  • 1
  • 17
  • 34
  • using `strace -f -p ` can be useful to give an indication of the problem. I would suspect that you would be seeing a lot of `ETIMEDOUT(Connection timed out)` in your case. – Steen Sep 12 '14 at 12:29

1 Answers1

1

Rack processes are the application servers running your site code not Passenger. I'd suspect problems with the recent upgrades and all the usual troubleshooting around that. Here's what a request looks like on your system.

user -> nginx -> passenger -> Rack process -> generates page

You system will have multiple Rack processes because each is single threaded and can only handle one request at a time. Passenger's job is to proxy requests and send them to Rack processes and start/stop/recycle those Rack processes as needed. Generally a Rack process will take 5-45 seconds to start depending on the complexity of your app so you'll usually have a few running even when not serving requests.

kashani
  • 3,922
  • 18
  • 18
  • Thanks. We tracked the problem down to a database integrity issue; some requests were being made for a resource that no longer existed, and for complicated reasons it wasn't returning a 404 when it should have. – pjmorse Mar 21 '12 at 14:30