9

Problem: I recently revamped one of my servers, it was tested prior to use, and functions well, however, a few days ago, I noticed approximately 4 times the usual amount of writes to the root volume. This is not a performance issue - the server runs fine.

My revamping was fairly extensive (a full rebuild) so I don't have a lot to go on in terms of cause. Briefly, my changes included:

  • Upgrading Amazon's Linux (from 2011.02 to 2011.09) - which also resulted in a change from ext3 to ext4 for the root volume
  • Moving from php-fcgi to php-fpm (currently using tcp)
  • Moving from a reverse-proxy (nginx -> apache) setup, to nginx only
  • Replacing vsftpd with pure-ftpd
  • Replacing dkim-proxy with opendkim
  • Replacing webmin with ispconfig
  • Adding varnish as a caching layer for dynamic files (overkill for the volume of hits these sites get, but its an experiment)
  • Adding a swap partition

Basic Setup:

  • My swap space is mounted on its own EBS volume - the writes to the swap volume are negligible - I have essentially discounted this as the cause (there is ample free memory - and both free and iostat show minimal swap usage).
  • My data (mysql database, user files (websites), all logs (from /var/log), mail, and varnish files on their own EBS volume (using mount --bind). The underlying EBS volume is mounted at /mnt/data
  • My remaining files - operating system and core server applications (e.g. nginx, postfix, dovecot, etc.) - are the only thing on the root volume - a total of 1.2GB.

The new setup runs 'smoother' (faster, less memory, etc.) than the old system, and has been stable for 20 days (mid-October) - as far as I can tell, the elevated writes have existed for all this time.

Contrary to what I would expect, I have a low read volume (my reads are about 1.5% of my writes, both in terms of blocks and bytes on my root volume). I have not changed anything on the root volume (e.g. new installations, etc) in the past few days, yet write volume continues to be much higher than expected.

Objective: to determine the cause of the increased writes to the root volume (essentially, figure out if it is a process (and what process), the different (ext4) file system, or another issue (e.g. memory)).

System Information:

  • Platform: Amazon's EC2 (t1.micro)
  • O/S: Amazon's Linux 2011.09 (CentOS/RHEL derived)
  • Linux kernel: 2.6.35.14-97.44.amzn1.i686
  • Architecture: 32-bit/i686
  • Disks: 3 EBS volumes:
    • xvdap1, root, ext4 filesystem (mounted with noatime)
    • xvdf, data, xfs filesystem (mounted with noatime,usrquota,grpquota)
    • xvdg, swap

The root and data volumes are snapshotted once a day - however, this should be a 'read' operation, not a write one. (Additionally, the same practise was used on the previous server - and the previous server was also a t1.micro.)

The data that caused me to look into the I/O was in the details of my last AWS bill (which had above normal I/O - not unexpected, since I was setting up this server, and installing a lot of things at the start of the month), and subsequently at the CloudWatch metrics for the attached EBS volumes. I arrive at the '4 times normal' figure by extrapolating the i/o activity from November (when I haven't altered the server) to estimate a monthly value and comparing that with the I/O from past months when I was not working on my previous server. (I do not have exact iostat data from my previous server). The same quantity of writes have persisted through November, 170-330MB/hr.

Diagnostic information (uptime for the following outputs is 20.6 days):

Cloudwatch metrics:

  • root volume (write): 5.5GB/day
  • root volume (read): 60MB/day
  • data volume (write): 400MB/day
  • data volume (read): 85MB/day
  • swap volume (write): 3MB/day
  • swap volume (read): 2MB/day

Output of: df -h (for root volume only)

Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            4.0G  1.2G  2.8G  31% /

The used space has not increased noticeably since this system was launched (which to me suggests that files are being updated, not created/appended).

Output of: iostat -x (with Blk_read, Blk_wrtn added in):

Linux 2.6.35.14-95.38.amzn1.i686  11/05/2011      _i686_

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s   Blk_read   Blk_wrtn avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.00     3.42    0.03    2.85     0.72    50.19  2534636  177222312   17.68     0.18   60.93   0.77   0.22
xvdf              0.00     0.03    0.04    0.35     1.09     8.48  3853710   29942167   24.55     0.01   24.28   2.95   0.12
xvdg              0.00     0.00    0.00    0.00     0.02     0.04    70808     138160   31.09     0.00   48.98   4.45   0.00

Output of: iotop -d 600 -a -o -b

Total DISK READ: 6.55 K/s | Total DISK WRITE: 117.07 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
  852 be/4 root          0.00 B     26.04 M  0.00 %  0.42 % [flush-202:1]
  539 be/3 root          0.00 B    528.00 K  0.00 %  0.08 % [jbd2/xvda1-8]
24881 be/4 nginx        56.00 K    120.00 K  0.00 %  0.01 % nginx: worker process
19754 be/4 mysql       180.00 K     24.00 K  0.00 %  0.01 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3106 be/4 mysql         0.00 B    176.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19751 be/4 mysql         4.00 K      0.00 B  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3194 be/4 mysql         8.00 K     40.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3156 be/4 mysql         4.00 K     12.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
 3099 be/4 mysql         0.00 B      4.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
24216 be/4 web14         8.00 K     10.43 M  0.00 %  0.00 % php-fpm: pool web14
24465 be/4 web19         0.00 B      7.08 M  0.00 %  0.00 % php-fpm: pool web19
 3110 be/4 mysql         0.00 B    100.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
  579 be/4 varnish       0.00 B     76.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
  582 be/4 varnish       0.00 B    144.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
  586 be/4 varnish       0.00 B      4.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
  587 be/4 varnish       0.00 B     40.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
 1648 be/4 nobody        0.00 B      8.00 K  0.00 %  0.00 % in.imapproxyd
18072 be/4 varnish     128.00 K    128.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish
 3101 be/4 mysql         0.00 B    176.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19749 be/4 mysql         0.00 B     32.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19750 be/4 mysql         0.00 B      0.00 B  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19752 be/4 mysql         0.00 B    108.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
19788 be/4 mysql         0.00 B     12.00 K  0.00 %  0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
  853 be/4 root          4.00 K      0.00 B  0.00 %  0.00 % [flush-202:80]
22011 be/4 varnish       0.00 B    188.00 K  0.00 %  0.00 % varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 1,1000,120 -u varnish -g varnish

To summarize the above (and extrapolate to daily values) it looks like, over the 10 minute period:

  • [flush-202] wrote 26MB = 3.6GB/day
  • php-fpm wrote 17.5MB = 2.4GB/day
  • MySQL wrote 684KB = 96MB/day
  • Varnish wrote 580KB = 82MB/day
  • [jbd2] wrote 528KB = 74MB/day
  • Nginx wrote 120KB = 17MB/day
  • IMAP Proxy wrote 8KB = 1.1MB/day

Interestingly enough, it would appear that between [flush-202] and php-fpm it is possible to account for the daily volume of writes.

Using ftop, I am unable to track down either the flush or php-fpm writes (e.g. ftop -p php-fpm.

At least part of my problem stems from identifying which processes are writing to the root volume. Of those listed above, I would expect all to be writing to the data volume (since the relevant directories are symlinked there) (e.g. nginx, mysql, php-fpm, varnish directories all point to a different EBS volume)

I believe JBD2 is the journaling block device for ext4, and flush-202 is the background flush of dirty pages. The dirty_ratio is 20 and the dirty_background_ratio is 10. Dirty memory (from /proc/meminfo) was typically between 50-150kB). Page size (getconf PAGESIZE) is the system default (4096).

Output of: vmstat -s | grep paged

3248858 pages paged in 104625313 pages paged out

Output of: sar -B | grep Average

                pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
Average:         1.38     39.57    113.79      0.03     36.03      0.38      0.02      0.29     73.66

The above appears to suggest a high number of pages paged out - however, I would expect that pages would be written to my swap partition if necessary, not to my root volume. Of the total memory, the system typically has 35% in use, 10% in buffers, and 40% cached, 15% unused (i.e. 65% free).

Output of: vmstat -d

disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
xvda1 105376  14592 2548092  824418 10193989 12264020 179666824 626582671      0   7872
xvdf  126457    579 3882950  871785 1260822  91395 30081792 32634413      0   4101
xvdg    4827   4048   71000   21358   1897  15373  138160  307865      0     29

vmstat consistently displays si and so values of 0

Output of: swapon -s

Filename                                Type            Size    Used    Priority
/dev/xvdg                               partition       1048572 9252    -1

On the hunch that the I/O writes may be memory related, I disabled varnish, and restarted server. This changed my memory profile to 10% in use, 2% in buffers, and 20% cached, 68% unused (i.e. 90% free). However, over a 10 minute run, iotop gave similar results as previously:

  • [flush-202] wrote 19MB
  • php-fpm wrote 10MB

In the hour since the restart, there have already been 330MB written to the root volume with 370K pages swapped out.

Output of inotifywatch -v -e modify -t 600 -r /[^mnt]*

Establishing watches...
Setting up watch(es) on /bin /boot /cgroup /dev /etc/ home /lib /local /lost+found /opt /proc /root /sbin /selinux /src /sys /usr /var
OK, /bin /boot /cgroup /dev /etc/ home /lib /local /lost+found /opt /proc /root /sbin /selinux /src /sys /usr /var is now being watched.
Total of 6753 watches.
Finished establishing watches, now collecting statistics.
Will listen for events for 600 seconds.
total  modify  filename
23     23      /var/log/
20     20      /usr/local/ispconfig/server/temp/
18     18      /dev/
15     15      /var/log/sa/
11     11      /var/spool/postfix/public/
5      5       /var/log/nginx/
2      2       /var/run/pure-ftpd/
1      1       /dev/pts/

Looking a bit further into the above, almost all of the writes can be attributed to an (unknown) process that is running every 5 minutes and checking the status of a variety of services (like chkservd on cPanel, but I don't use cPanel, and didn't install this). It amounts to 4 log files (cron, maillog, ftp, imapproxy) updated during the 10 minutes - and a few related items (postfix sockets, pure-ftpd connections). The other items are primarily modified ispconfig sessions, system accounting updates, and invalid (non-existent server_name) web access attempts (logged to /var/log/nginx).

Conclusions and Questions:

Let me start by saying I am a bit perplexed - I am usually fairly thorough, but I feel that I am missing something obvious on this one. Clearly, flush and php-fpm account for the bulk of the writes, however, I don't know why this might be the case. Firstly, let's take php-fpm - it shouldn't even be writing to the root volume. It's directories (both files and logs) are symlinked to another EBS volume. Secondly, the primary things that php-fpm should be writing are sessions and page-caches - which are both few and small - certainly not on the order of 1MB/min (more like 1K/min, if that). Most of the sites are read-only, with only occasional updates. The total size of all web files modified in the last day is 2.6MB.

Secondly, considering flush - the significant writes from it suggest to me that dirty pages are frequently being flushed to disk - but given that I typically have 65% free memory and a separate EBS volume for swap space, I can't explain why this would affect the writes on my root volume, especially to the extent that is occurring. I realize that some processes will write dirty pages to their own swap space (instead of using system swap space), but surely, immediately following a restart with the vast majority of my memory being free, I should not be running into any substantial amount of dirty pages. If you do believe this to be the cause, please let me know how I might identify which processes are writing to their own swap space.

It is entirely possible that the whole dirty pages idea is simply a red herring and is completely unrelated to my problem (I hope it is actually). If that is the case, my only other idea that there is something relating to ext4 journaling that wasn't present in ext3. Beyond that, I am currently out of ideas.

Update(s):

Nov 6, 2011:

Set dirty_ratio = 10 and dirty_background_ratio = 5; updated with sysctl -p (confirmed via /proc); reran 10 minute iotop test with similar results (flush wrote 17MB, php-fpm wrote 16MB, MySQL wrote 1MB, and JBD2 wrote 0.7MB).

I have changed all the symlinks I setup to use mount --bind instead. Re-enabled varnish, restarted server; reran 10 minute iotop test with similar results (flush wrote 12.5MB, php-fpm wrote 11.5MB, Varnish wrote 0.5MB, JBD2 wrote 0.5MB, and MySQL wrote 0.3MB).

As at the above run, my memory profile was 20% in use, 2% in buffers, and 58% cached, 20% unused (i.e. 80% free) Just in case my interpretation of free memory, in this context, is flawed, here is the output of free -m (this is a t1.micro). total used free shared buffers cached Mem: 602 478 124 0 14 347 -/+ buffers/cache: 116 486 Swap: 1023 0 1023

Some additional information: Output of: dmesg | grep EXT4

[    0.517070] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
[    0.531043] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
[    2.469810] EXT4-fs (xvda1): re-mounted. Opts: (null)

I also ran ftop and iotop simultaneously, and was surprised to notice that entries showing up in iotop, did not appear in ftop. The ftop list was filtered to php-fpm, since I could trigger writes of that process fairly reliably. I noted about 2MB of writes per page view for php-fpm - and I have yet to figure out what it could possibly be writing - any ideas on ascertaining what is being written would be appreciated.

I will try to turn off journaling in the next few days, and see if that improves things. For the moment though, I find myself wondering if I have an I/O problem or a memory problem (or both) - but I am having a hard time seeing the memory problem, if there is one.

Nov 13, 2011:

As the file system uses extents, it was not possible to mount it as ext3, additionally, attempts to mount it as read-only, simply resulted in it being remounted as read-write.

The file-system does indeed have journaling enabled (128MB journal), as is evident from the following:

Output of: tune2fs -l /dev/sda1 | grep features

has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

As per the following, about 140GB has been written to this volume in a bit under a month - just about 5GB/day.

Output of: dumpe2fs -h /dev/sda1

Filesystem volume name:   /
Last mounted on:          /
Filesystem UUID:          af5a3469-6c36-4491-87b1-xxxxxxxxxxxx
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              262144
Block count:              1048576
Reserved block count:     10478
Free blocks:              734563
Free inodes:              210677
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      511
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stride:              32582
Flex block group size:    16
Filesystem created:       Wed Sep 21 21:28:43 2011
Last mount time:          Sun Nov 13 16:10:11 2011
Last write time:          Sun Oct 16 16:12:35 2011
Mount count:              13
Maximum mount count:      28
Last checked:             Mon Oct 10 03:04:13 2011
Check interval:           0 (<none>)
Lifetime writes:          139 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       18610
Default directory hash:   half_md4
Directory Hash Seed:      6c36b2cc-b230-45e2-847e-xxxxxxxxxxx
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0002d91c
Journal start:            1

Continuing to look for open files, I tried using fuser on the root volume:

Output of: fuser -vm / 2>&1 | awk '$3 ~ /f|F/'

root       1111 Frce. dhclient
root       1322 frce. mysqld_safe
mysql      1486 Fr.e. mysqld
root       1508 Frce. dovecot
root       1589 Frce. master
postfix    1600 Frce. qmgr
root       1616 Frce. crond
root       1626 Frce. atd
nobody     1648 Frce. in.imapproxyd
postfix    1935 Frce. tlsmgr
root       2808 Frce. varnishncsa
root      25818 frce. sudo
root      26346 Fr.e. varnishd
postfix   26925 Frce. pickup
postfix   28057 Frce. smtpd
postfix   28070 Frce. showq

Nothing unexpected, unfortunately. On the off-chance that it was due to underlying hardware, I restored yesterday's snapshot of the root volume (nothing had changed in the last day), and replaced the instance's root volume with the new one. As expected, this had no effect on the problem.

My next step would have been to remove journalling, however I stumbled across the solution before getting to that.

The problem lay in APC using file-backed mmap. Fixing this dropped disk i/o by about 35x - to (an estimated) 150MB/day (instead of 5GB). I might still consider removing journalling as this appears to be the major remaining contributor to this value, however, this number is quite acceptable for the time being. The steps taken to arrive at the APC conclusion are posted in an answer, below.

cyberx86
  • 20,620
  • 1
  • 60
  • 80
  • 3
    My gut feeling is it's filesystem journaling. – David Schwartz Nov 06 '11 at 13:01
  • 1
    You might want to start a bounty on this just to get people to read it. – Andrew Case Nov 08 '11 at 21:55
  • I only skimmed through your question but have you tried monitoring output of "lsof" . You could write a script that will constantly monitor output of lsof and report no of files open and their sizes. Etc.. – Andrey Nov 10 '11 at 08:50
  • @Andrey - Thanks for the suggestion - the use of lsof is definitely interesting. Since my problem is with writes (not reads), the limitation I see with lsof, is that it doesn't list how much is written to a file - the file size itself doesn't seem to be related. I threw together a command to see the regular files open for writing on the root volume (not other mounts), and ran it through `watch`. There were only a few files (17) - mostly PID files or lock files, with a few (non-existent) temp files. `watch -d -n 0.5 'lsof / | grep REG | awk '"'"'$4 ~ /.*[wu]/ { print $9}'"'"' | sort -u'` – cyberx86 Nov 10 '11 at 22:04
  • Not strictly true. I just run a quick test : started "dd if=/dev/sda of=/root/test_file" and on another terminal "watch -n 1 'lsof | grep test_file'" I could see that size value on the file grow. – Andrey Nov 11 '11 at 08:57
  • But it could be that it's not just one/few particular files creating writes but many. I would suggest you write a script to monitor number of files open over time and in addition monitor output of "lsof | awk ' { s = int($7/(2^20)); if(s>0) print (s)" MB -- "$9 | "sort -rn" }'" I would do it over a couple of days and then graph the results. Above lsof command is taken from here http://www.commandlinefu.com/commands/tagged/361/lsof – Andrey Nov 11 '11 at 09:06
  • @Andrey - A very interesting idea - I have (somewhat) tried the monitoring of open files idea (manually though, not scripted) - every time I have checked, the list has been the same - I'll improve it a bit and edit my post over the weekend. I think one of the strange things with my case though, is that the used space is not changing - between 5 and 7GB has been written to the root volume every day for the past month - but the used space hasn't changed from 1.2GB. That seems to suggest to me that any files that are growing are also shrinking - or any new files are being deleted. – cyberx86 Nov 11 '11 at 21:34
  • That is a promising command - I added a `uniq -c` to it, since about a dozen files were open 25-35 times and will try to filter it by files open for writing (since read levels are not an issue). The perplexing thing is that there really shouldn't be anything significant writing to my root volume - practically everything that should need writing has been moved to another volume (logs, user-files, databases, etc.) (Actually, that raises an interesting idea - I might try to make my root volume read only - though it will take some effort - and I am not entirely sure of the full implications). – cyberx86 Nov 11 '11 at 21:36

1 Answers1

5

Since the leading cause seemed to be journaling, that would have been my next step. In order to remove journaling, however, I would need to attach the EBS volume to another instance. I decided to test the procedure out using a (day old) snapshot, however, before removing journaling, I re-ran the 10 minute iotop test (on the test instance). To my surprise, I saw normal (i.e. non-elevated) values, and this was the first time that flush-202 didn't even show up on the list. This was a fully functional instance (I restored snapshots of my data as well) - there had been no changes to the root volume in the 12 hours or so since it was taken. All tests showed that the same processes were running on both servers. This led me to believe that the cause must come down to some requests that the 'live' server is processing.

Looking at the differences between the iotop outputs of the server displaying the problem and the seemingly identical server that had no problem, the only differences were flush-202 and php-fpm. This got me thinking that, while a long shot, perhaps it was a problem related to the PHP configuration.

Now, this part wasn't ideal - but since none of the services running on the live server would suffer from a few minutes of downtime it didn't really matter. To narrow down the problem, all the major services (postfix, dovecot, imapproxy, nginx, php-fpm, varnish, mysqld, varnishncsa) on the live server were stopped, and the iotop test rerun - there was no elevated disk i/o. The services were restarted in 3 batches, leaving php-fpm until the end. After each batch of restarts, the iotop test confirmed that there was no issue. Once php-fpm was started the issue returned. (It would have been easy enough to simulate a few PHP requests on the test server, but at this point, I wasn't sure it was actually PHP).

Unfortunately, the server would be rather pointless without PHP, so this wasn't an ideal conclusion. However, since flush-202 seemed to suggest something relating to memory (despite having ample free memory), I decided to disable APC. Rerunning the iotop test showed that disk i/o levels were normal. A closer look into the matter showed that mmap was enabled, and that apc.mmap_file_mask was set to /tmp/apc.XXXXXX (the default for this install). That path sets APC to use file-backed mmap. Simply commenting this line out (therefore using the default - anonymous memory backed) and rerunning the iotop test showed the problem was resolved.

I still do not know why none of the diagnostics run did not identify the writes as coming from php and going to the apc files in the /tmp directory. The only test that even mentioned the /tmp directory was lsof, however, the files it listed were non-existent.

cyberx86
  • 20,620
  • 1
  • 60
  • 80