3

I am running PHP in preforked(MOD_PHP) on an apache2 server. The setup is on linux Ubuntu 10.04. The database I am using is a firebird database 2.5.2. The apache2 server are running in a web cluster consisting of 8 webservers.

At a point in time we had serious performance issues due to large spikes in the requests to the application(we have peek hours). The bottleneck showed to be the amount of database connections needed to resolve the number of requests that came in a very short time. Firebird didn't cope with it very well and requests simply timed out.

This type of database does not have a connection pool and that is why I have been using pconnect in PHP to be able to relieve some stress from the database. This persists the database connection in the apache2 process. This was a major performance boost. The down side is that we had to let an apache2 process take a lot of requests before it gets rotated and we keep a lot of apache2 process running even though there is not load for it. The webservers are running with 70 apache processes each. This is to keep the connections open and ready. Basically we tried to make apache2 our connection pool. This works. When a user requests the application the database handle is ready and Firebird doesn't have to worry about the cost of creating a new database connection.

Now here is my question. We now need to have a lot of databases - small ones. But they will all be running within the cluster of apache2 servers. This means that in the lifetime of an apache2 process its very likely to get persistent connections to several databases (maybe 80-100).

I'm concerned how apache2 will handle this scenario. Is there a limit in apache2 in how many connections it can handle? Will it get slower.. will it just grow in memory and handle everything perfect?

For now there is nothing to do about the database sharding. We (as the dev team) were not at all fond about the idea of splitting the databases. But there were no green light for rewriting the application and create a new database structure to get more performance from the code. Hardware, for now, is the answer. There are also legal matters that forces us to split into several databases to enclose data. But this is were I get a bit worried about what apache2 is able to handle.

Does anyone know?

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Ronnie Jespersen
  • 221
  • 5
  • 13

2 Answers2

0

First of all, you should really consider that design carefully. Best practices in database design is to avoid splitting your data in multiple databases as far as possible. Of course, there are exceptions (for example if you need to let customers/applications modify their schema), but in general scalability in the direction of connecting to multiple databases is bad. In addition, cross database queries are hard to write, expensive and badly supported in many (most?) database systems, forcing you to do a lot of work which could be done by the database in the application instead. Maintainability is a nightmare when you have a N..N relationship between databases and applications.

Secondly, your explanation is a bit unclear, but I wonder how you use your pool..? Normally with prefork you keep a few servers around, to improve latency. As long as the server lives, so should the persistent connection. pconnect doesn't really pool, nor does it sound like you do. Could you give a more clear example?

Thirdly there are several resources you can run out of. File Descriptors is likely to pose a problem quite early, and having many connections from each worker will eat memory both in Apache and the database. In addition establishing the connections is going to be expensive, even with a connection pool. If you can't reduce the number of databases in use you will most likely have to eat the cost of using connect instead of pconnect to reduce the amount of resources wasted by stale db connections.

If somebody came to me with a design requiring connections to hundreds of databases from an Apache worker I would send them straight back to the drawing board. There are just too many things that can and will go wrong with such a design. Merge the databases or, if that is not possible, put in a middle-layer.

edit: Okay, now things are a little bit more clear.

This is partially a capacity planing question, and those are typically impossible to provide good answers for. But, to clarify a few points.

  1. As mentioned before, you have many hard limits you may hit when trying to establish connections to many databases. File Descriptors and memory (shared memory especially) are likely to be early limits you hit.

  2. What you are doing is not really connection pooling, as the persistent connections will never be returned to a pool. It's just persistent connections, which saves you some overhead.

  3. My approach, if I really absolutely must shard the databases in this way (which I think is a VERY bad design choice) would be to dedicate Apache processes for each database. That way I reduce the risk of accumulating too many persistent connections in one process. There is no way, as far as I know, to do this without having multiple installations of Apache2 and split the requests based on FQDN. Alternative solutions I would consider would be to eat the cost of using connect and use caching to try to minimize the number of DB hits, and to examine if I can't decouple the web application from the database mess.

pehrs
  • 8,749
  • 29
  • 46
  • Hi @pehrs Thanks for your answer. I have updated my question in hope of casting more light on the matter. I would for one as well had run screaming away but in case of emergency and customer downtime the best decisions isn't always made.. – Ronnie Jespersen Mar 31 '14 at 16:22
  • Please let me know if it didn't answer your questions. – Ronnie Jespersen Mar 31 '14 at 16:31
  • 1
    Good explanation. Though connects isn't an option. Firebird cant handle spikes with connections... its like it stalls... Dedicated Apache's might be the solution... we talked about this as well as a possible solution.. – Ronnie Jespersen Apr 03 '14 at 20:03
0

Is there a limit in apache2 in how many connections it can handle?

Yes, it's limited by its configuration parameters (MaxClients, MaxServers, KeepAlive, MaxRequestsPerChild etc) and by CPU/memory (with KeepAlive you can trade some of one for the other).

will it just grow in memory...?

Yes, the more connections the more threads and more memory used. Look at how much memory each Apache thread is using (20 - 30MB is typical) so you can have an idea of the maximum you can ave given the available RAM. If you disable unused modules you'll consume less memory per thread.

Optionally if memory is an issue you can look at using Nginx instead of Apache, since nginx consumes a (small) fixed amount of memory.

In any case typically the bottleneck won't be in the web server but in the database and disk I/O which are addressed with caching and database settings/code/schema optimization.

Ultimately this looks like a capacity planning question, and at the end the only way of knowing for sure if your system can handle a number of connections is to try and replicate the setup (or a representative part of it) in a test environment and benchmark it.

LinuxDevOps
  • 1,754
  • 9
  • 14
  • When you mention threads what do you mean? I have 70 apache process (* 8 servers) running at all times. I can keep creating new apache webservers... but subsequently databases will end in more pconnection's pr. apache process. So I'm not worried about the limitation of the apache configuration. Lets say I only had one apache process. This then got the responsibility to handle 1000 persistent connections and never rotate the process. How would it handle this? You are true in the last statement but maybe you could already mention some pitfalls there might be.. – Ronnie Jespersen Mar 31 '14 at 19:19
  • `s/thread/process/g` (you prob use Pre-fork) . So you are not worried by apache # of connections (since you can add more servers) but apache's processes handling long-lived connections to the db. I think I understand but I can't think of specifics only of the generic advice mentioned, sorry. – LinuxDevOps Mar 31 '14 at 19:34
  • It's not that your answer wasn't any good... @pehrs was just more comprehensive and explanatory. – Ronnie Jespersen Apr 03 '14 at 20:11