Firstly, this is not an answer, so much as a diagnostic approach.
This is by no means comprehensive - or even anything close, it is just a starting point.
Time to First Byte
Time to first byte (TTFB) has a number of components:
- DNS Lookup: Find the IP address of the domain (possible improvement: more numerous/distributed/responsive DNS servers)
- Connection time: Open a socket to the server, negotiate the connection (typical value should be around 'ping' time - a round trip is usually necessary - keepalive should help for subsequent requests)
- Waiting: initial processing required before first byte can be sent (his is where your improvement should be - it will be most significant for dynamic content.
When you look at an ApacheBench output, you also see:
- Processing: This is the sum of waiting + complete transfer of content (if the transfer time is significantly longer than what would be expected to download the quantity of data received, further processing (after the first byte received) is occurring (e.g. the page is flushing content as it is available)
Comparisons to Eliminate components
With few exceptions, your problem is going to lie in the backend processing, which usually comes down to overly complex/inefficient code, or poorly configured MySQL.
A good way to approach this problem is through a series of comparisons that will eliminate various aspects of your setup. A good comparison should keep as much constant as possible to help narrow down the problem. Currently, you have provided the following comparisons:
- Identical (cloned) site running on old server and new server:
- Difference: Server
- Result: old server is fast; new server is slow
- Notes: What you need here is to quantify the differences between these servers - both in terms of the stack used (Nginx, etc) and the hardware (is the old server faster because it is a more powerful machine?)
- Conclusion: the code may be able to run fast on the right setup
- Test site vs full site on the new server
- Difference: content, themes, plugins, etc
- Result: test site is fast, full site is slow
- Notes: in theory, this test should help you to eliminate a lot of aspects of your setup - DNS, network, even your nginx/php/mysql setup - however, it is not quite 'fair'.
- Conclusion: the extra content is having a significant impact on performance
The ideal test would have you duplicate your full site, but then delete all the content except for one article and the associated comments. The point of this test would be to conclusively determine if the large amount of content is the problem or if other aspects of your setup (wordpress plugins, theme, etc) are the cause. You would essentially compare the performance of identical sites, on the same (new) server - loading the same page (same length, etc) - with the only difference being the total site content (e.g. there is a good chance that some plugin does not scale well with increased content).
Without changing anything, there are some other comparisons you can do:
- Test from a remote location vs local - this will help identify if network, latency, dns, etc is the cause
- You have already (somewhat) done this and mostly concluded that you don't have a network problem.
- Test via Varnish (i.e. port 80) vs nginx directly (port 8080) - try not to change your configuration between tests - just use the correct port. This will show you the impact of Varnish. Since Varnish is a caching layer, it should serve all requests after the first one very quickly - essentially, it should bypass the backend and the processing that is needed to generate a dynamic page, and serve the cached copy very quickly.
- You have done this (although, not locally) and demonstrated that Varnish has a significant positive impact on your performance.
Tuning your Backend
By this point you should have either found the problem or concluded that it lies in your backend. That leaves you Nginx, PHP, or MySQL.
(I should mention here, that is it always handy to know if your bottleneck is CPU, RAM, or I/O - between sar
, top
, iostat
, vmstat
, free
, etc you should be able to come to some conclusion on this.)
Nginx
Nginx is just taking requests and either serving static content or shifting the requests to PHP-FPM - there usually isn't much to optimize with Nginx.
- Set workers = # CPU cores
- Enable keepalive (a value of 10-15 is good)
- Disable unneeded logging
- Increase buffer sizes if needed
- Avoid if statements (use static names instead of regexes where possible, eliminate unneeded extensions)
Ideally, your test blog and cloned blog have identical configs, in which case, you have effectively eliminated Nginx as the problem.
Application
In the case where you are trying to identify a problem in your code (for instance a slow plugin, etc) the slow logs are the place to start.
- Enable the MySQL slow log and the PHP-FPM slow log run your benchmark and see what is coming up as slow.
MySQL
- Increase your caches and run mysqltuner.pl to get a good starting point.
PHP
- disable unneeded extensions,
- disable register_globals, magic_quotes_*, expose_php, register_argc_argv, always_populate_raw_post_data
- increase the memory_limit
- open_basedir and safe_mode have significant performance implications, but also can provide an additional layer of defense. Test with and without them, to determine if their impact on performance is tolerable.
PHP-FPM
- Adjust the pm.* values - increase them to deal with high load
It is worth noting that your htop results show php-fpm as consuming the bulk of the CPU - and your problem does appear to be directly related to this.
Caching
Once you have optimized each likely bottleneck, start caching.
- You have an opCode cache (APC) already - ensure that it is working (it comes with a test file) - check your cache hit rates, and if possible have APC cache to memory instead of to disk.
- Setup your code to cache (e.g. using a plugin for Wordpress such as W3TC)
- With nginx you can setup FastCGI caching - but since you have Varnish, this is best avoided.
- Setup a caching layer, such as Varnish (which you have already done) - and ensure that it is working (e.g use varnishstat, read Achieving a high Hitrate)
- Add more caching for components of your site - e.g. MemCached if applicable
Sometimes, given the limitations of your application and hardware, you may not be able to improve backend performance that much - however, that is the point of caching - to minimize the use of the backend.
Further reading