2

Encountered a situation today on a server that has me wondering. Here's the scenario:

Syslog shows:

kernel: EXT3-fs warning (device sdb2): ext3_dx_add_entry: Directory index full!

Found the culprit to be a directory with 9.1 million files in it. I know it was 9 million files because I used this to delete them:

perl -e 'my $i=0;for(<*>){$i++;((stat)[9]<(unlink))} print "Files deleted: $i\n"'

Right after completion, I ran ls - that took about 3 minutes, and returned 1 file.

A few minutes later, a fresh batch - again 9.1 million files have appeared in the same directory, and syslog showed again:

kernel: EXT3-fs warning (device sdb2): ext3_dx_add_entry: Directory index full!

I ran the delete again, and the exact same scenario repeated itself. Few minutes later, a new batch of over 9 million files.

The files that just appeared are old (about 3 months old).

Can someone confirm if this is the expected behavior of ext3?

  • Directory index full is raised, well, when it's full
  • New files are allowed to be created, but can't be added to the index
  • New files are cached "somewhere"
  • Once a slot is freed, the new file is added to the index (and hence will show up with e.g. ls)

I suspect that this is what's happening, but I currently don't have any proof.

Any feedback appreciated!

Please note the question isn't about how to fix it, it's about understanding what's happening here.

HBruijn
  • 72,524
  • 21
  • 127
  • 192
Stefan
  • 21
  • 2
  • I don't think it works that way where new files are cached "Somewhere". delete the files and remount the filesystem in read only mode. If some process is writing there, it will fail. obviously, if old files are still existing there, it will be still visible. – Nehal Dattani Aug 13 '15 at 04:12
  • unfortunately we can't just easily do that. This is on a live production system and the directory in question holds audit files, plus the software itself. We can't just un-mount it. – Stefan Aug 13 '15 at 04:47
  • Also, there is no process that writes 9.1 million files in just a few minutes - particularly files with old timestamps. – Stefan Aug 13 '15 at 04:48
  • Well, in that case delete the file and use audit rules to watch write on that particular directory. – Nehal Dattani Aug 13 '15 at 04:49
  • I guess the real question is this: what does ext3 / the kernel do, if "Directory index full" is hit, and new files are still being created? We know for a fact that writes do NOT fail. – Stefan Aug 13 '15 at 04:50

1 Answers1

0

"...the question isn't about how to fix it, it's about understanding what's happening here...."

My guess is that you're suffering of a (severe?) file-system corruption and... the more files are going to be (virtually?) created and removed, the more severe the corruption is going to be.

I'm saying this 'cause:

  1. you wrote "...A few minutes later [...] again 9.1 million files have appeared in the same directory...". Let's assume that with "few minutes" you intended 15 minutes. This means roughly 10K files created per second. Even tough your server/storage can handle such a load, it's hard to believe that while such a creation process is running there are not any signs of such an activity! So, at least, the double fact that:

    • 10K files per seconds are created (in a 15 minutes timeframe);
    • you're not noticing any strange load/behaviour of your system

    let me think that... the creation process is "fake" and, as such, you have file-system integrity issues;

  2. even tough the message "ext3_dx_add_entry: Directory index full!" let me think that in such a scenario it should be not possible to create additional files within related filesystem (BTW: as mentioned by @Stefan , commenting your OP), you seem to be able to effectively create additional files. That's again -- IMHO -- another fact raising attention. As a partial mitigation, the word "warning" (...and not "error") in the log messages (EXT3-fs warning).... makes me think that even kernel-developers expected that such a situation is not so "critical" (...they think it's a warning... and not an error. So it should not be something so... terrible!);

In addition to the above, I also found this other SF post confirming that, at least is that specific situation, problem was a file-system-integrity issue.

As for the final part of your question (the one related to the "caching" hypothesis), even if I'm not at all a kernel-hacker, I strongly believe that this is not a kernel behaviour as it would be out of scope of the kernel, being something to be approached cross-module (not being possible to implement dealing only with the ext3 module). But, please, don't blame me if this last sentence is absolutely wrong! It's only my "feeling" :-)


Update

As for point 2) above, I was wrong: the "Directory Index" seems not to be strictly related to the file-system structure used to effectively store both file-data and file-metadata. Instead it's "only" a means to optimize searching in directories containing lots of files [see here or this other SF post here]. This seems to explain why the log reports a "warning" (and not an "error") as, in my opinion, when "Directory Index" is full... everything can proceed as usual, without the benefits of the indexing.
Damiano Verzulli
  • 3,948
  • 1
  • 20
  • 30
  • does this mean that if an index exists, it is used in place of the actual directory contents when executing i.e. "ls" ? This would explain some of the seemingly irritating behavior we saw: 1) delete 9 mil files. 2) run "ls" and see 1 file returned. 3) 10 minutes later run "ls" again and see millions of files again, and at the same time another "Directory index full" message. Is that the case? – Stefan Aug 14 '15 at 07:23
  • I'm not confident enough with kernel-hacking to make such a statement, sorry. Anyway, I don't think that "_...if an index exists, it is used in place of the actual directory contents..._", as "index" and "directory contents" are two different things. In my opinion "index" (...when activated) is involved only in the "searching" phase, while the gathering of "content" is always provided by the real directory content. With a broken index, you got a broken "search" and, as such, a misleading output. I expect that should you access one of those file, an error will be raised. – Damiano Verzulli Aug 14 '15 at 08:11
  • Another comment: as Directory Indexing can be disabled, probably you'll solve your issues by.... disabling it (http://linux.die.net/man/8/tune2fs - it has a proper option) but don't forget that, in my opinion, you're experiencing a potentially serious filesystem corruption (hopefully, restricted only to the index). That's why I would surely plan an "fsck" as soon as possible (with a backup/disaster recovery plan, ready to be applied, should something really weird happens). – Damiano Verzulli Aug 14 '15 at 08:15
  • "With a broken index, you got a broken "search" and, as such, a misleading output" - so the same thing would apply to a full index, right ? – Stefan Aug 14 '15 at 08:46
  • I guess what I'm getting at is this: - Index runs full. New files are still created and stored in the directory, but not in the index - Whenever you search the directory using the index you'll miss the files that didn't fit into the index - This could potentially apply to files being deleted, listed, anything - If you'd then delete files, and immediately after search the contents, you may see nothing - Once the index has been rebuilt/re-populated, old entries misteriously appear again Isn't that exactly what we're seeing here? – Stefan Aug 14 '15 at 08:49
  • Wish comments wouldn't remove newlines. Sorry it's a bit hard to read – Stefan Aug 14 '15 at 08:49
  • As for your previous to last comment, I don't think so. As far as I understood, to rebuild the index a proper action need to be explicitely taken: somethink like and e2fsck with "-D" option. Simply deleting all the files in your directory should not be enough. Anyway, again, I'm moving myself in a not-comfortable kernel-hacking lands so... chances are high that I'm writing **lots** of wrong things (...and this is not savvy to do. Sorry! Hopefully someone else will help us understanding the gory details). – Damiano Verzulli Aug 14 '15 at 08:59