Is there a way to force all file transactions with a filesystem to be UTF8 or UTF16 compliant?

7

1

What I want is to specify that for a directory, every file creation/modification within said directory will be checked by the kernel and if the filename has unsupported characters the offending process will be given a "permission denied" error.

I was thinking about writing a fuse-driver that rejects non-compliant filenames. But that does not seem practical.

I am not looking for solutions that recommend things like a cronjob or inotify that clean up unwanted characters after-the fact. I'm looking for something that is preemptively preventative.

life of pi

Posted 2013-03-04T15:06:49.207

Reputation: 71

What filesystem are you using? mount has some file system specific options that could (I said could, never tried it) help. Search man mount for unicode or utf8. – terdon – 2013-03-04T15:10:26.873

I'm using ext4. I read the manpage: nothing interesting. The classic unix way regarding filenames is not to care about them at all, as long as they don't contain / or the 0-byte. I have never heard of the feature I'm requesting, thus my question. – life of pi – 2013-03-04T15:23:46.177

2What does "unicode filename" even mean in this context? Do you want to ensure all filenames are valid UTF-8? – Cairnarvon – 2013-03-04T15:54:45.363

Yes. sorry I didn't say that. – life of pi – 2013-03-04T16:16:13.467

On the other hand, it we would be nice, if the encoding could be specified on an per directory basis. Like utf16 for a collection of files named in chinese and utf8 in another directory for files mostly named in european languages. – life of pi – 2013-03-07T16:09:47.867

1You're out of luck - the filesystem doesn't have a concept of "character set", filenames are simply byte sequences, with only / and \0 disallowed, as you say. You'd need to intercept all filename creation calls (linking, renaming, creating), either at the filesystem layer or by changing glibc (directly or monkey-patching via LD_PRELOAD - that one's easier to do and hard to enforce). There's no "intended" way to do this. – Gabe – 2013-04-26T12:50:10.630

Answers

2

ZFS has mechanisms for making datasets (and maybe pools) be UTF8 only, potentially with different normalization mechanisms.

Further reading:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg28314.html

http://www.freebsd.org/cgi/man.cgi?query=zfs&manpath=FreeBSD+9.1-RELEASE

killermist

Posted 2013-03-04T15:06:49.207

Reputation: 1 886