I found a similar topic here on Stack Overflow.
With the following code most of the characters will be translated to their "closest character". Although i couldn't get the ’
translated. (Maybe it does, i can't make a filename in the prompt with it ;) The ß
also does not get translated.
function Remove-Diacritics {
param ([String]$src = [String]::Empty)
$normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
$sb = new-object Text.StringBuilder
$normalized.ToCharArray() | % {
if( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
[void]$sb.Append($_)
}
}
$sb.ToString()
}
$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
$newname = Remove-Diacritics $_.Name
if ($_.Name -ne $newname) {
$num=1
$nextname = $_.Fullname.replace($_.Name,$newname)
while(Test-Path -Path $nextname)
{
$next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
$nextname = $_.Fullname.replace($_.Name,$next)
$num+=1
}
echo $nextname
ren $_.Fullname $nextname
}
}
Edit:
I added some code to check if a filename already exists and add (1)
, (2)
etc... if it does. (It's not smart enough to detect an already existing (1)
in the filename to be renamed so in that case you would get (1) (1)
. But as always... everything is programmable ;)
Edit 2:
Here is the last one for tonight...
This one has a different function for replacing the characters. Also added a line to change unknown characters like ß
and ┤
for example to _
.
function Convert-ToLatinCharacters {
param([string]$inputString)
[Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($inputString))
}
$files = gci -recurse | where {$_.Name -match "[^\u0020-\u007F]"}
$files | ForEach-Object {
$newname = Convert-ToLatinCharacters $_.Name
$newname = $newname.replace('?','_')
if ($_.Name -ne $newname) {
$num=1
$nextname = $_.Fullname.replace($_.Name,$newname)
while(Test-Path -Path $nextname)
{
$next = ([io.fileinfo]$newname).basename + " ($num)" + ([io.fileinfo]$newname).Extension
$nextname = $_.Fullname.replace($_.Name,$next)
$num+=1
}
echo $nextname
ren $_.Fullname $nextname
}
}
Doesn't require 3rd party tools... by hand sounds about right. Define 3rd party. And Python has a cute function called casefold() that with a little work could give you what you wanted.
– Doktoro Reichard – 2013-08-24T19:11:24.300I have the same PS script and love it; it is indispensable since I frequently still use FAT32 and Windows XP (and even DOS). However mine lets you choose where to look:
gci -recurse $args[0] | where {$_.Name -match "[^\u0000-\u007F]"}
– Synetech – 2013-08-24T19:34:46.2201There’s no such thing as “extended ASCII”. – kinokijuf – 2013-08-24T20:11:58.380
1
@kinokijuf, tell that to everybody else.
– Synetech – 2013-11-29T20:04:48.543@Synetech ASCII is characters 0x00 to 0x7F. – kinokijuf – 2013-11-29T20:20:33.250
@kinokijuf, nobody questioned that; the ASCII set is indeed 0-127. And the “extended ASCII set” is 128-255. You may not like it, but it is the de facto term and has been for decades. Google alone returns 170,000 hits for it. – Synetech – 2013-11-29T21:57:25.400
@Synetech There are no “characters 128–255”; Windows NT has used UTF-16 since the very beginning. – kinokijuf – 2013-11-29T22:45:29.283
1@kinokijuf, and of course, nothing existed before Windows NT. – Synetech – 2013-11-29T22:50:47.830
1@kinokijuf, Synetech is correct. The extended ASCII code set had been in existence for over a decade when Windows NT shipped. Every DOS program known to man used the extended ASCII set. – Roger – 2013-11-30T02:00:34.287
@Roger There is no single “extended ASCII” set. There are several language-specific OEM code pages.
– kinokijuf – 2013-11-30T21:27:33.8202
@Kinokijuf, there are code pages now. That's not in dispute. DOS added code page support only in DOS 3.3. However, the extended ASCII character set was built into the ROM of the original IBM PC display adapters. See this site for more info.
– Roger – 2013-12-19T19:23:14.670@Roger There were code pages back in 16-bit times. In NT they are only for legacy support. NTFS has stored names in UTF-16 since the beggining. Talking about “extended ASCII character
0xAF
” is meaningless without specifying which code page. – kinokijuf – 2013-12-20T05:30:07.793@Synetech @Roger tell me what is the “extended ASCII” code of
ᡅ
. – kinokijuf – 2013-12-20T05:38:24.953Just ignore kinokijuf, he’s trying to be pedantic and purposely being obtuse. Characters 0-127 are the standard ASCII set and characters 128-255 in any code-page are and have always been called the extended set whether he likes it or not. – Synetech – 2013-12-20T06:17:47.507
@Synetech The character i posted (a random Mongolian letter) is in no code page, so clearly it is not “extended ASCII” according to your definition, yet i’m sure the OP wants it replaced. – kinokijuf – 2013-12-20T18:15:45.333
And? Instead of obstinately arguing over de facto terminology that has been used by countless people for decades and decades, you could have simply edited the question to use
non-ascii
from the start. ◔_◔ – Synetech – 2013-12-20T18:23:32.010