We have an Ubuntu 12.04 server running on a VMware ESXi host, with multiple partitions, running Zimbra 8 (mta/ldap only). One of partitions is a 30GB partition mounted as /opt/zimbra/data. I had a rude wake up call this morning from my manager claiming the mail server was down.
I logged in and took a look and sure enough, all commands were reporting that there wasn't enough free space on the /opt/zimbra/data partition. I was trying to figure out which file it was that was using up all the space, but both df and du failed me. Here are the outputs from the various commands after I had tracked down the file by doing an ls one directory at a time:
zimbra@mail:/opt/zimbra/data/ldap/mdb/db$ du -sh .
20M .
zimbra@mail:/opt/zimbra/data/ldap/mdb/db$ du --apparent-size data.mdb
31409532 data.mdb
zimbra@mail:/opt/zimbra/data/ldap/mdb/db$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sde1 30G 1.3G 28G 5% /opt/zimbra/data
zimbra@mail:/opt/zimbra/data/ldap/mdb/db$ ls -allh
total 20M
drwxr-xr-x 2 zimbra zimbra 4.0K Nov 26 13:50 .
drwxr-xr-x 3 zimbra zimbra 4.0K Jul 16 05:47 ..
-rw------- 1 zimbra zimbra 30G Nov 26 14:04 data.mdb
-rw------- 1 zimbra zimbra 8.0K Nov 26 14:07 lock.mdb
Note that the file data.mdb is using up all 30Gigs, but is not being taken into account when reporting the total space.
We have since created a new partition, copied the files over, and have things up and running, but I am still curious what would have caused the incorrect reporting of used space on the partition. We still have the old partition lying around, so if there are other commands I don't know of that would yeild more accurate results, I would like to try them out.
Update:
Output from ls -alsh
root@mail:/opt/zimbra/data/ldap/mdb/db# ls -alsh
total 20M
4.0K drwxr-xr-x 2 zimbra zimbra 4.0K Nov 26 13:50 .
4.0K drwxr-xr-x 3 zimbra zimbra 4.0K Jul 16 05:47 ..
20M -rw------- 1 zimbra zimbra 30G Nov 26 14:04 data.mdb
4.0K -rw------- 1 zimbra zimbra 8.0K Nov 26 14:07 lock.mdb
So it looks like the file is indeed a sparse file, and was very close to the partition size. I still have no idea why commands like touch
or cp
started returning no space left on disk
, as this file only seems to have actually been using around 20 MB.
Wrapping up, here is what I have come up with to list all files, including sparse files, recursively and order the results by file size desc:
ls -aldSh $(find .) | grep -v '^d'
For any Zimbra users who might end up here with a similar issue, the following links will help:
ldap database went from 97meg to 86gig
Update 2:
Another thing to check is if the partition has run out of inodes. Of particular interest are the logger and zmstat subdirectories. These contain a number of small files, and can quickly run out of inodes before running out of space, if mounted on their own partitions. Most commands will still return a "no space left on device" error, which can be misleading.
df -i
can be used to show information on the number of free inodes. For example, I have a partition that has about 80% free space according to df -h
, but still returns a "no space left on device" error because it is out of inodes:
root@mail:/opt/zimbra/logger# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdh1 20G 3.9G 16G 21% /opt/zimbra/logger
root@mail:/opt/zimbra/logger# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdh1 5120 5120 0 100% /opt/zimbra/logger