0

I have a specific directory where I am unable to delete the files or the directory. The directory is on an ext4 file system, using RAID5 over 3 disks on a QNAP NAS.

Using rm -f gives me:

rm: unable to stat `file.jpg': Input/output error

And also shows the following on dmesg:

EXT4-fs error (device dm-0): dx_probe:933: inode #55050534: block 3591: comm rm: Directory hole found

Since the "Input/output error" usually means hardware, I have run several tests such as what I assume is QNAP's version of fsck, badblocks, as well as the SMART rapid and long tests on the individual disks, plus raid scrubbing gets run periodically anyway - and all have come back saying no errors.

The folder has a lot of files as it's blackhole for thousands of files added daily/hourly (unable to count them as ls | wc suffers from the "Input/output error" and "Directory hole found" errors) - so based on similar black hole post) I'm assuming I've trashed the directory and not the hardware.

Unfortunately the version of find on the QNAP does not support the -exec argument, and so I cannot try what was suggested in that post.

The questions are:

  1. What exactly is a 'Directory hole',
  2. How can I delete the directory and files within a directory hole.
drgrog
  • 161
  • 1
  • 5
  • This is a job for `fsck.ext4`, but I don't know how to run that on a NAS. – kasperd Oct 09 '18 at 10:31
  • On a full server, yes `fsck.ext4` would be good. However, a QNAP FS check runs `/bin/e2fsck -C 0 -fp -N /dev/mapper/cachedev1` - only the command line params don't make sens - and QNAP tells me they have their own modified and recompiled version of e2fsck with different arguments. They also keep telling me silly answers like 'its due to something accessing the file' - and so I don't know what to believe from them, and don't seem to be getting anywhere with the open ticket, hence coming here. – drgrog Oct 23 '18 at 09:52

2 Answers2

1

Directory-holes are actually a feature of ext4 since 2013 but the Linux kernel was not fixed to support it until late 2019. However directory-holes can only be formed as a result of running fsck, so it could indirectly be caused by data errors that forced you to run fsck.

The following fsck option should remove all directory holes :

fsck.ext4 -Df dm-0

Be sure to run on an unmounted file-system (will also work on a read-only mounted file system, but there is a slight risk that anyone reading files while fsck is running will get garbage).

Be warned that this will take at least as long as a full fsck (eg fsck.ext4 -f dm-0), and during this period the filesystem will not be available for use.

bjoster
  • 4,423
  • 5
  • 22
  • 32
-1

Run below command inside this directory. It's basically a replacement of the -exec rm {}. It will build list of files inside dir and construct multiple rm commands with limit of arguments in one run set to 10

find ./ -type f|xargs -n 10 rm

Above will delete all files under this directory, if that was your intent. I'd probably try to salvage files first by copying them to a different dir.

Dmitry Zayats
  • 1,378
  • 6
  • 7
  • Thanks @dmitry-zayats, but that has the same problem. Even the `find` command gives the error `find: ./filename.jpg: Input/output error` and results in the `find: Directory hole found` errors on the dmesg log. I'd really like to know what a directory hole is, and will update the question to ask that too. – drgrog Oct 09 '18 at 10:19
  • Trying to delete files from a corrupted file system is a terrible suggestion. Any write to a corrupted file system comes at a risk of making the corruption even worse. – kasperd Oct 09 '18 at 10:35
  • So is a 'directory hole' some kind of file system corruption? If it was IO errors due to physical corruption or bad hardware, then I totally agree that deleting or writing to it is a bad idea. I just don't know what a 'directory hole' is and am not yet convinced it is like typical file system corruption, or typical file name corruption where things can be deleted via their inode number. – drgrog Oct 09 '18 at 10:49
  • 1
    @drgrog Yes, it is some kind of file system corruption. A file can have a hole in it, which means one of the pointers to data blocks making up the file (or one of the indirection pointers) is zero. Reading that section of a file will return NUL characters all the way through. And writing to that section of a file will allocate a data block to fill in the hole. The same data structures are used to point to the blocks containing a directory listing, but in that case there is not supposed to ever be a hole. A hole where there isn't supposed to be one is one kind of corruption. – kasperd Oct 09 '18 at 20:01
  • @kasperd, thanks, that sounds like a reasonable start to a partial, at least to the first part 'what is a directory hole' question. I also intend to futher understand the ext4_bread method and how it results in the hole error (https://github.com/torvalds/linux/blob/master/fs/ext4/namei.c#L112) – drgrog Oct 23 '18 at 10:06