The reason for the problem
The issue turns out to be in how XFS allocates inodes. Unlike most file systems, allocation happens dynamically as new files are created. However, unless you specify otherwise, inodes are limited to 32-bit values, which means that they must fit within the first terabyte of storage on the file system. So if you completely filled that first terabyte, and then you enlarge the disk, you would still be unable to create new files, since the inodes can't be created on the new space.
Solution 1 - change mount options
One solution is to re-mount the file system with the mount option inode64
. However some applications will behave weirdly on this (e.g. MySQL), and NFS will be very confused. So if you're not sure that your system will work with this option, you can move on to the next option.
Solution 2 - move files
The second solution is to find some of the files that are currently stored in the first terabyte, and move them to another area of the file system.
Moving by age
In our case, this was easy - the file system had been in use for years, so we could simply find the oldest files and move them away from the file system, and then move them back. This was easily done using find:
find /extra -mindepth 3 -maxdepth 3 -type d -mtime +730 -exec du -sh {} \; > /tmp/olddirs.txt
gave us a list containing the size and directory name for all directories at exactly 3 levels below the mountpoint, which were older than 2 years. We could then sort the list to find the largest directories, and use mv
to move them away to another file system and back again.
Moving by allocation group
If you can't simply go by age, e.g. when a lot of files were created at the same time, you can still find the right files to move, but it takes a bit more time.
XFS has allocation groups (aka AGs), starting with 0. You could check the block size and number of blocks of each AG to figure out which groups are on the first terabyte, using xfs_info /path/to/mountpoint
. Or you can just check the first few AGs to see which ones are full, and then clear those.
- Checking the free space in the first four AGs:
for ag in `seq 0 1 5`; do echo freespace in AG $ag; xfs_db -r -c "freesp -s -a $ag" /dev/CACHE/CACHE ; grep "total free"; done
If the total free space in any group is less than 40, you won't be able to create new files in it.
- Find files in that AG
This requres checking the metadata for each file on the filesystem. It will take a long time... Here's a suggestion:
find /extra -mindepth 3 -type f -exec xfs_bmap -v {} \; > /tmp/agfilelist.txt
You can then grep for " 0 "
(that's a space, a zero and another space) to find all files on AG 0, grep for " 1 "
to find the ones on AG 1, etc... Start with AG 0, move the largest files away (using mv
, not cp
!) and then back again. Repeat until you have a fair amount of space free.
Outcome
Once we'd moved enough files away from /extra and then back again, there was lots of space in AG 0 and it was once again possible to create new files.