5

I'm a novice linux admin and now responsible for the OS of a 3 node Tomcat cluster. (Tomcat is handled by the DEVs luckily.)

I got alarmed by our monitoring solution that /var on server01 has only 172MB left of free space. Most likely because /var/log did fill up.

So I investigated with:

server01:/var# for i in $(ls); do du -sh $i; done
3.5M backups
100M cache
51M lib
0   local
0   lock
598M log
0   mail
0   opt
40K run
32K spool
144K tmp
4.0K www

If I sum that up I end by something around 760MB used. The numbers don't change if I dig deeper into the directory tree. So this is correct.

But if I execute a df -h I end up with completely different numbers for /var. df shows that 2.8G out of 3.0G are used.

server01:/var# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             950M  205M  697M  23% /
tmpfs                 2.0G     0  2.0G   0% /lib/init/rw
udev                  2.0G  4.0K  2.0G   1% /dev
/dev/sda3             961M   33M  928M   4% /tmp
/dev/dm-0             2.0G  506M  1.5G  26% /usr
/dev/dm-1             3.0G  2.8G  172M  95% /var
/dev/dm-2              20G   17G  3.3G  84% /home

The funny thing is, that the other 2 nodes are reporting even more used spaced on /var. Because /var/log/ on node 2 and 3 are consuming 200-300MB more space. But partitions and the underlying LVM are having the same size on all 3 nodes.

On server02 and server03 df -h reports that everything is fine and only 1.0 to 1.2GB are used from 3.0GB.

So where is my space being used?

I heared of those little bastards called inodes and checked for this. df -i reports:

server01:/var# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1             123648    6099  117549    5% /
tmpfs                 506908       3  506905    1% /lib/init/rw
udev                  506487     675  505812    1% /dev
/dev/sda3             987968       7  987961    1% /tmp
/dev/dm-0            2048000   19786 2028214    1% /usr
/dev/dm-1             705808    1807  704001    1% /var
/dev/dm-2            13619632    5906 13613726    1% /home

And on server02 and server03:

server03:/var# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1             123648    6100  117548    5% /
tmpfs                 506908       3  506905    1% /lib/init/rw
/dev                  506487     675  505812    1% /dev
/dev/sda3             987968       7  987961    1% /tmp
/dev/dm-0            2048000   19784 2028216    1% /usr
/dev/dm-1            3096576    1758 3094818    1% /var
/dev/dm-2            13113840    5642 13108198    1% /home

So /var on server01 has 705.808 inodes, while server02 and server03 have 3.096.576 inodes on /var. But is this really the cause? As only 1% is used on each node.

If yes, how do I increase the inodes? (All filesystems are XFS out of / that is ext2)

/etc/fstab is the same on all 3 nodes. OS is Debian Lenny 64bit with Kernel 2.6.35.4.

Regards

3 Answers3

4

You can run lsof | grep deleted and check witch programs allocated this space (and the deleted file).

example:

[root@mab-01 ~]# lsof | grep deleted
hald-addo  2651 haldaemon  txt       REG              253,0      15720    3769183 /usr/libexec/hald-addon-keyboard.#prelink#.IhBW5L (deleted)
yum-updat  2899      root  txt       REG              253,0       4736    3276902 /usr/bin/python.#prelink# (deleted)
mongod     5535    mongod  txt       REG              253,0    8640360    3484794 /usr/bin/mongod (deleted)
mongod     5535    mongod    1w      REG              253,0     278032     262244 /var/log/mongo/mongod.log.rpmsave (deleted)
mongod     5535    mongod    2w      REG              253,0     278032     262244 /var/log/mongo/mongod.log.rpmsave (deleted)
3

/var/log did fill up

If you delete log files that are open for writing by a process, the filenames dissapear (so are not seen by du?) but the space allocated is still allocated and as the process continues to write, the allocated space can increase.

If the logs were TomCat logs you need to tell Tomcat to reopen it's log files.

Note "copytruncate" in this example. I don't know if this applies to your situation though.

RedGrittyBrick
  • 3,792
  • 1
  • 16
  • 21
  • So the safest way would be to restart Tomcat and if this doesn't work, restart the server? –  Nov 09 '11 at 18:50
  • 1
    @Yaarrg: I'm no Tomcat expert, so there may be less disruptive solutions, but yes that should work. Also see Marcello Bittencourt's answer which would help confirm the cause of the problem. – RedGrittyBrick Nov 09 '11 at 19:19
  • @yaaarrg : At this point, restarting Tomcat is probably the simplest solution. In the future, note that it's possible to truncate a file without disrupting the Tomcat processes: http://www.cyberciti.biz/faq/truncate-large-text-file-in-unix-linux/ – Stefan Lasiewski Nov 09 '11 at 19:39
0

thanks for the tip with lsof | grep deleted. In fact im getting dozens of deleted files for Apache2 and Tomcat6.

server01:~# lsof | grep deleted | wc -l
124

After restarting Apache2 the number of deleted files reduced to 40. And I had 2.4 GB free on /var. I also searched for deleted file on the other 2 hosts and found out that on server02 there are also deleted file still open. Luckily this time I stated a "ps auxf" before. There I saw that an Apache2 Thread was open since November 8th. After "kill -9 $oldapache2threadpid" these deleted files vanished also. Maybe this was also the problem on server01.

I then did a restart of the Tomcat service on server01. The deleted files vanished also, but free space didn't increase. But free space on /var now matches (with a few MB) what du -sch tells me.

So, thanks for help everyone :-)

Still need to investigate why Apache isn't closing all his threads.

Regards