0

It all began a couple weeks ago:

  • When I try to use vi I get "E297: Write error in swap file
  • $ echo "test" > test produces -bash: echo: write error: No space left on device
  • My bash history is always empty

Is not my quota as it happens with all users.

Other than that the server seems to be just fine...

I think it may be the root / swap but I don't know how to fix it.

Here is some information I think might be usefull:

$ sudo df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/fileserver--00-root
                      224G  212G     0 100% /
none                  995M  192K  995M   1% /dev
none                 1000M     0 1000M   0% /dev/shm
none                 1000M   14M  986M   2% /var/run
none                 1000M     0 1000M   0% /var/lock
none                 1000M     0 1000M   0% /lib/init/rw
/dev/sdb1             1.4T 1006G  356G  74% /cubo/d2p1
/dev/sdc1             459G  416G   39G  92% /cubo/d3p1
/dev/sda1             228M   17M  199M   8% /boot
192.168.1.7:/nfs/Backups
                      1.8T  1.2T  645G  65% /cubo/nfsMounts/ixBackup

Also:

$ ll /dev/mapper/
total 0
crw-rw---- 1 root root  10, 59 2013-04-11 12:54 control
brw-rw---- 1 root disk 251,  0 2013-04-11 12:54 fileserver--00-root
brw-rw---- 1 root disk 251,  1 2013-04-11 12:54 fileserver--00-swap_1

Additional information

$ sudo dmsetup status
fileserver--00-swap_1: 0 11993088 linear 
fileserver--00-root: 0 475701248 linear 

And:

$ sudo dmsetup info
Name:              fileserver--00-swap_1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      251, 1
Number of targets: 1
UUID: LVM-z7USJS3uIlf3VVUPeDeE0TzljgezS31fcvrwZihBYEENf5Tkgsyb9xJHo3RNVXsT

Name:              fileserver--00-root
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      251, 0
Number of targets: 1
UUID: LVM-z7USJS3uIlf3VVUPeDeE0TzljgezS31fdr57i4JAzZxlK6KeTOWDTm6bzUKK87J1

My root folders size:

$ cd /
$ sudo du -sh {bin,boot,cdrom,dev,etc,home,lib,lost+found,media,mnt,opt,proc,root,sbin,selinux,srv,sys,tmp,usr,var}
7.4M    bin
17M boot
4.0K    cdrom
192K    dev
39M etc
1.1M    home
154M    lib
16K lost+found
4.0K    media
4.0K    mnt
4.0K    opt
du: cannot access `proc/21251/task/21251/fd/4': No such file or directory
du: cannot access `proc/21251/task/21251/fdinfo/4': No such file or directory
du: cannot access `proc/21251/fd/4': No such file or directory
du: cannot access `proc/21251/fdinfo/4': No such file or directory
0   proc
48K root
7.5M    sbin
4.0K    selinux
4.0K    srv
0   sys
16K tmp
542M    usr
282M    var

My fstab:

$ cat /etc/fstab 
# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    nodev,noexec,nosuid 0       0
/dev/mapper/fileserver--00-root /               ext4    errors=remount-ro 0       1
# /boot was on /dev/sda1 during installation
UUID=1724d880-01a4-481c-87e5-08328c3c8137 /boot           ext2    defaults        0       2
/dev/mapper/fileserver--00-swap_1 none            swap    sw              0       0
/dev/sdb1  /cubo/d2p1  ext3  defaults  0  0
/dev/sdc1  /cubo/d3p1  ext4  defaults  0  0
/dev/sdd1  /cubo/d4p1  ext4  defaults,noauto  0  0
192.168.1.7:/nfs/Backups /cubo/nfsMounts/ixBackup    nfs   defaults    0    0

How do I fix this?

Luciano
  • 163
  • 1
  • 4

2 Answers2

2

As I said elsewhere you have your root partition full.

The line 224G 212G 0 100% / is the clue, you are using 100% of the filesystem mounted on /.

NickW
  • 10,183
  • 1
  • 18
  • 26
  • thanks. I added additinal information showing my root folders size. They barelly reach 1GB.. – Luciano May 08 '13 at 12:20
  • Then most likely some running process has a huge temporary file open and unlinked. Run `lsof +L1` to find such unlinked files, then decide what to do with them (e.g., killing processes which hold unlinked files open will free the space occupied by such files). – Sergey Vlasov May 08 '13 at 12:29
  • tryed. `lsof +L1` listed 4 smbd process. I restarted samba. The process where gone but nothing changed. I also deleted 30MB of log files.. nothing changed. – Luciano May 08 '13 at 12:36
  • So `sudo lsof +L1` does not show you anything big (see the `SIZE` column)? – Sergey Vlasov May 08 '13 at 12:41
  • I think I found something. Running `sudo lsof |grep -E '[0-9]{10}'` lists **1 process** `rsyslogd 992 syslog 3r REG 0,3 0 4026531996 /proc/kmsg` **and 6 process like** `rpc.idmap 1023 root 8u REG 0,3 0 4026532150 /proc/1023/net/rpc/nfs4.nametoid/channel`. This sizes seem ruge! **What do I do now?** – Luciano May 08 '13 at 12:43
  • These are `/proc` files, not real files on your root filesystem, so they are not the problem. Note also `0,3` in the `DEVICE` column — this also confirms that these files are not on the root FS (look for `251,0` there, which matches your root device `/dev/mapper/fileserver--00-root`). – Sergey Vlasov May 08 '13 at 12:47
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/8676/discussion-between-sergey-vlasov-and-luciano) – Sergey Vlasov May 08 '13 at 12:50
1

After investigating the problem in chat, the reason was determined — the space on the root filesystem was occupied by files hidden underneath the mount points (and therefore invisible to du).

In Linux there are two ways to access files and directories hidden under mount points:

  1. The obvious way — unmount the filesystem mounted over the directory, then look what is inside that directory. Obviously, this is impossible to do while the filesystem is in use.

  2. Using a bind mount, the outer filesystem could be made accessible in another directory in the tree, and normal bind mounts are not recursive — they do not copy nested mounts, therefore directories which were overmounted in the old location become accessible in the new location. This is possible to do on a running machine without disrupting operations which use the filesystem, so this method will be used here.

Commands to perform such a bind mount for the root filesystem:

sudo mkdir /mnt/tmp_root
sudo mount --bind / /mnt/tmp_root

(In this case using /mnt/tmp_root was possible, because the space reserved for root was not 100% consumed.)

Then finding big files hidden underneath mount points is possible:

sudo du -x --max-depth=1 /mnt/tmp_root
sudo du -x --max-depth=1 /mnt/tmp_root/cubo
...

After finding the offending files they can be removed to free the space. Note that it is not possible to remove directories which are used as mount points in other bind mounts of the same filesystem — e.g., if a NFS filesystem is mounted at /cubo/nfsMounts/ixBackup, removing /mnt/tmp_root/cubo/nfsMounts/ixBackup is not possible (but files and directories below it can be removed).

Finally, a way to defend against such problems in the future is to tighten permissions on directories which are intended to be used as mount points, so that in case there are problems which prevent mounting (e.g., the NFS server is not responding), the directory is not accessible, and attempts to access it fail in an obvious way:

sudo chown root:root /mnt/tmp_root/cubo/nfsMounts/ixBackup
sudo chmod 0600 /mnt/tmp_root/cubo/nfsMounts/ixBackup

(This changes permissions of the directory on the root filesystem, and does not do anything to the filesystem which could be mounted at /cubo/nfsMounts/ixBackup.)

The last operation is to remove the bind mount after it is no longer needed, and remove the temporary directory:

sudo umount /mnt/tmp_root
sudo rmdir /mnt/tmp_root
Sergey Vlasov
  • 6,088
  • 1
  • 19
  • 30
  • Thanks so much to Sergey Vlasoc for solving the issue and delivering so much knowledge while at it! Wish I had someone like this in my company. – Luciano May 08 '13 at 14:11