0

I've got a Linux server with some directories and files structure on it. Apparently somehow someone uploaded a bunch of filenames, which got corrupted. Consider the following example:

└── parent
    ├── foo1.jpg
    ├── f+�o2.jpg
    └── foo+�.html

There are about 1000 files and directories, so manual fixing is not really a good option. Is there a way to find all corrupted names with a single terminal command? Maybe a command, which filters the names which contain non-ascii symbols or something like that? What would be the best practice? Thank you!

1 Answers1

0

Assuming you are simply looking to identity filenames containing non-ASCII characters:

LC_ALL=C find /path/to/files | grep -P "[\x80-\xFF]"

If this does not work for you, I recommend you download and install detox. From it's manpage:

The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It'll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.

fpmurphy
  • 841
  • 6
  • 13