6

MySQL queries in my production environment are taking much longer than I would expect them too. The site in question is a fairly large Drupal site, with many modules installed. The webserver (Nginx) and database server (mysql) are hosted on separated machines, connected by a 100mbps LAN connection (hosted by Rackspace).

I have the exact same site running on my laptop for development. Obviously, on my laptop, the webserver and database server are on the same box.

Here are the results of my database query times:


Production:

Executed 291 queries in 320.33 milliseconds. (homepage)

Executed 517 queries in 999.81 milliseconds. (content page)

Development:

Executed 316 queries in 46.28 milliseconds. (homepage)

Executed 586 queries in 79.09 milliseconds. (content page)


As can clearly be seen from these results, the time involved with querying the MySQL database is much shorter on my laptop, where the MySQL server is running on the same database as the web server.

Why is this?!

One factor must be the network latency. On average, a round trip from from the webserver to the database server takes 0.16ms (shown by ping). That must be added to every singe MySQL query. So, taking the content page example above, where there are 517 queries executed. Network latency alone will add 82ms to the total query time. However, that doesn't account for the difference I am seeing (79ms on my laptop vs 999ms on the production boxes).

What other factors should I be looking at? I had thought about upgrading the NIC to a gigabit connection, but clearly there is something else involved.

I have run the MySQL performance tuning script from http://www.day32.com/MySQL/ and it tells me that my database server is configured well (better than my laptop apparently). The only problem reported is "Of 4394 temp tables, 48% were created on disk". This is true in both environments and in the production environment I have even tried increasing max_heap_table_size and Current tmp_table_size to 1GB, with no change (I think this is because I have some BLOB and TEXT columns).

user15241
  • 93
  • 2
  • 2
  • 4

6 Answers6

7

It's because if the DB is not on the same server as the frontend you have to do a round trip for each query (potentially more, depending on the size of the answer). It can easily take up to 1 ms to ping another server in the same datacenter. Much less if the two servers are on the same rack, but 1 ms is a good rule of thumb.

That means that for ~300 small queries you should expect ~300ms, pretty close to what you are seeing.

If the two servers are on the same machine, you only have to do some context switches between processes to move the data from the DB process to the frontend. Usually a context switch (with all the typical flushes) take ~40us (very broadly speaking), and you need at least a couple of them (frontend asks for data, DB reads requests and prepares and serves the data, and frontend reads back the data). So I would expect ~80us for each query (let's round it to 0.1ms to make the math easier). You can thus expect ~30ms for 300 small queries, which is also pretty close to what you are seeing.

2

It's really hard to give you a definite answer. I really don't think the network is the bottleneck, well, unless you have some really heavy traffic on the server.

Do you have the same exact configuration on both your development machine and the production server?

Are you sure that all indexes were created successfully on the production server?

It could the amount of data in your production server. I mean to compare the results between the production and development machines you should have the same number of rows in the tables, is this the case?

Did you use the same installation files to setup MySQL on your development machine and production server? I mean do you have the same version of MySQL on both your development and production machines? Is it for the same OS? Also 32bit or 64bit? Probably a bug in the version of MySQL installed on the server.

The possibilities are endless, without more details it will be really hard to give you an informative answer.

Mee
  • 847
  • 5
  • 15
  • 24
  • Network traffic between web and database servers isn't high at all. MySQL communication is happening on a dedicated interface, and the load on it is well within it's bounds. MySQL configuration on production and development machines is different (I'll update the main thread with the details in a minute). The two databases are identical, with the same amount of data in each (I regularly re-sync them). The production machine is using MySQL 5.0.45 x86_64. Development is using 5.0.75 x86_64. – user15241 Sep 19 '09 at 11:06
0

Do you have query caching enabled? Look at the output of:

SHOW VARIABLES LIKE 'query_cache_size';

If it's greater than zero, it's on. This will make quite a difference to the performance of a website backed by MySQL, though I'm not sure if it's anything to do with your specific issue here.

I don't suppose there's any way to capture the set of queries and run them from a login on the MySQL database server, to see whether things are different there?

Another possible issue is that if you're not doing session caching, you're opening a new session every time you make a database call. Do you know if you're session caching? Opening a new session for every call can get expensive across the network.

Morven
  • 1,031
  • 1
  • 7
  • 15
  • Yes, I have the query cache enabled in both production and development environments. Production : 134217728 Development: 16777216 I'm pretty sure that session caching must be being used, though I don't know how to test. I can not imagine that Drupal opens a new database connection for every query! – user15241 Sep 18 '09 at 10:38
0

The only problem reported is "Of 4394 temp tables, 48% were created on disk". This is true in both environments and in the production environment I have even tried increasing max_heap_table_size and Current tmp_table_size to 1GB, with no change (I think this is because I have some BLOB and TEXT columns).

You can put MySQLs tmpdir on a RAM-based filesys (e.g. tmpfs) to help improve performance.

Cheers

P.S.

Maybe the performance hit is RAID writes in Production.

HTTP500
  • 4,827
  • 4
  • 22
  • 31
  • sure, but that still doesn't explain the difference between the two setups, which both have that problem. – user15241 Sep 18 '09 at 10:08
0

What I would do to identify where the bottleneck is: I would do a profiling of the query: http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html

If the numbers are way lower than what you can see from the same query you run on your web server, then it is a "transport" issue. (Rackspace network latency, switch config, large sets of data (SELECT *) with TEXT/BLOB fields returned, etc.)

If the numbers are almost similar, you have to optimize your queries and improve your config file. Here is a link that may help: https://stackoverflow.com/questions/1191067/how-to-test-mysql-query-speed-with-less-inconsistencies/1191506#1191506

BTW, when my server went from 100Mbs to 1Gb, the performance increase was amazing!

Toto
  • 283
  • 1
  • 4
  • 12
  • I tried profiling some queries in this way. It turns out that queries run from the production web server or database server take about the same amount of time to run. The same queries run on my laptop actually run slower than in production! – user15241 Sep 18 '09 at 10:33
  • Although, what I was comparing there was possibly what you were getting at - I logged into MySQL on the web server and on the database server, ran the queries on both, and compared the profile data. Is that right? Or, do you mean to compare the query time reported by MySQl profiles (which wouldn't include any network latency) with that that is reported by drupal (which would include network latency)? – user15241 Sep 18 '09 at 10:57
  • Good point: you could run some queries from the web server and then directly from the database. The difference would be the "transport" time. Re-reading your question, I noticed that each page can generate +500 queries?! This is way to much. You definitively have to cache data (Memecache, html cache, etc.) to reduce the number of queries. Do not forget that PHP executes requests one after the other (In a same script)... Which means that you have too much communication between your 2 servers and it takes time. :( – Toto Sep 21 '09 at 13:30
0

The TCP connection setup and tear down is much more costly than the socket connection when working on localhost. Check how many connections to the database are initiated in a page load.

Craig
  • 591
  • 2
  • 5
  • Max connections is set to 100. Max used connections shows consistently as 28. I think this number must relate to the 38 fcgi processes I have somehow. Only one connection is created per page load - that is how drupal works. – user15241 Sep 19 '09 at 11:36