6

I'm trying to migrate a bunch (300GB+) of files from a FAT32 drive to my freeNas ZFS filesystem but every command I throw at it (tar,pax,mv,cp) throws an 'invalid argument' when it encounters a non-ASCII filename - it's usually something that's been created under Windows and it reads something along the lines of "foo?s bar.mp3..." where the ? may have been an apostrophe or such.

Can anyone help with a few lines of code to recursively go through the directory tree and rename files to remove the offending characters.

Much appreciated.

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
Dan
  • 261
  • 4
  • 11

5 Answers5

7

Rename can do this..

try something like

find dir -depth -exec rename -n 's/[^[:ascii:]]/_/g' {} \; | cat -v

you may need the cat -v to properly display any weird characters without your terminal getting screwed.

if that prints acceptable substitutions change the -n to -v.

That said, it sounds like the charset on your filesystem is wrong(mount -o utf8 ?), since this sort of thing should really work...

Justin
  • 3,776
  • 15
  • 20
  • Thank you for the reply-- I have read that I should be able to mount my filesystem as something different but the web seemed to indicate that it doesn't apply to FAT32 partitions? I'd like to be corrected though if this isn't the case? FreeNAS auto-mounts the drive when it starts but I do believe there's the option to override/re-mount etc. – Dan Jan 04 '10 at 13:49
  • Hmm... I don't seem to have the rename command on this box? I've tried man rename with no luck – Dan Jan 04 '10 at 18:52
2

This is a one correct way to apply recursively:

find . -depth -execdir rename 'y/[\:\;\>\<\@\$\#\&\(\)\?\\\%\ ]/_/' {} \;

change all of this symbols for underscore. Be carefull, is considering all white spaces.

why it works? take this test:

mkdir test

cd test

mkdir -p a$/b$/c$/d$ f%/g%/h%/i% j%/k%/l%/m%

find . -depth -execdir rename 'y/[\:\;\>\<\@\$\#\&\(\)\?\\\%\ ]/_/' {} \;

ls -R

(as you can see, all files were changed)

1

Use convmv to convert the file names if they are really incorrectly encoded. You should prefer mounting the filesystem with the correct encoding in the first place.

joschi
  • 20,747
  • 3
  • 46
  • 50
0

Try mounting the filesystem with the iocharset option set to the encoding it uses.

From man mount under the "Mount options for fat" section:

   iocharset=value
          Character set to use for converting between 8 bit characters and
          16 bit Unicode characters. The default is iso8859-1.  Long file‐
          names are stored on disk in Unicode format.

See also under the "Mount options for vfat" section:

   uni_xlate
          Translate  unhandled  Unicode  characters  to  special   escaped
          sequences.   This lets you backup and restore filenames that are
          created with any Unicode characters. Without this option, a  '?'
          is used when no translation is possible. The escape character is
          ':' because it is otherwise illegal on the vfat filesystem.  The
          escape  sequence  that gets used, where u is the unicode charac‐
          ter, is: ':', (u & 0x3f), ((u>>6) & 0x3f), (u>>12).

and

   utf8   UTF8  is  the  filesystem safe 8-bit encoding of Unicode that is
          used by the console. It can be be  enabled  for  the  filesystem
          with this option or disabled with utf8=0, utf8=no or utf8=false.
          If `uni_xlate' gets set, UTF8 gets disabled.

Edit:

I'm sorry, that was Linux, this is for BSD (from man mount_msdosfs:

 -L locale
     Specify locale name used for file name conversions for DOS and
     Win'95 names.  By default ISO 8859-1 assumed as local character
     set.

 -D DOS_codepage
     Specify the MS-DOS code page (aka IBM/OEM code page) name used
     for file name conversions for DOS names.
Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
  • Thanks for the reply, here's what I tried:
    oracle:/mnt# mount -t msdos -o iocharset=utf8 /dev/ad6s1 /mnt/Elements
    
    But this failed with:
    mount: Using "-t msdosfs", since "-t msdos" is deprecated.
    mount_msdosfs: /dev/ad6s1: mount option  is unknown: Invalid argument
    
    – Dan Jan 06 '10 at 16:01
  • doesn't work in comments, use backticks instead. – Dennis Williamson Jan 06 '10 at 17:45
  • Okay I tried with `mount_msdosfs` but I'm unlear what to specify for the L or D switches as the docs don't go into any specific detail. I figured the drive meant I needed the `large` option and here's what I tried: `mount_msdosfs -o large /dev/ad6s1 /mnt/Elements`. Unfortunately this didn't work either, I'm still getting pax (and others) choking on certain filenames containing 'special' characters. – Dan Jan 07 '10 at 21:35
  • What is the origin of the disk? Was it from a Windows system? What language version? Likely values for -L or -D would include CP437 or IBM437, CP1252, ASCII, ISO-646, other ISO-8859-* or variations of those names or others. On my Ubuntu system there's a directory at `/usr/share/i18n/charmaps/` with files of character maps (the filenames and the header text in them) can be informative. See also: http://en.wikipedia.org/wiki/Character_encoding – Dennis Williamson Jan 08 '10 at 01:03
  • It was a Western Digital "Elements" External USB HDD, of which the external case failed, which I removed it from and popped it into a spare SATA port in my NAS box. OS X showed it as a FAT drive so made the assumption of it being FAT32. I've used it with UK versions of Windows XP and OS X. – Dan Jan 08 '10 at 09:06
  • If you look at WD's tech support pages, they have warnings about sharing their drives between OS X and Windows. My opinion is that it's just CYA, but you might look into it. However, are the problematic files exclusively of OS X origin? Does the problem have anything to do with resource forks? – Dennis Williamson Jan 08 '10 at 12:27
  • CYA? The files are a mix but mainly of Windows origin – Dan Jan 09 '10 at 21:06
  • CYA=Cover Their Posterior (approximately) – Dennis Williamson Jan 09 '10 at 21:57
  • Haha, I'll note that for future use. So back home , I tried your advice: `oracle:/mnt/Elements# mount_msdosfs -L ISO8859-1 -D CP1252 /dev/ad6s1 /mnt/Elements/ mount_msdosfs: ISO8859-1: No such file or directory` So I tried without the 'L' switch thusly: `racle:/mnt/Elements# mount_msdosfs -D CP1252 /dev/ad6s1 /mnt/Elements/mount_msdosfs: cannot find or load "msdosfs_iconv" kernel module mount_msdosfs: msdosfs_iconv: No such file or directory` Bugger! I seem to have stalled on this one again. It does appear that there are some issues with this switch though `http://bit.ly/6KsTGW` – Dan Jan 10 '10 at 20:27
0

Replacing by underscores :

find . | perl -ane '{ if(m/[[:^ascii:]]/) { print } }' | rename -n 's/[^[:ascii:]]/_/g'
Fedir RYKHTIK
  • 577
  • 8
  • 18