Convert a text file from ansi to UTF-8 in windows batch scripting

0

We have a text file which is in default ANSI format and that needs to be converted into UTF-8 format. Is there any way we can use the general windows DOS commands to convert the file? We can use the PowerShell but only this command line has to be run from a different batch process.

Raj

Posted 2017-04-20T07:07:53.257

Reputation: 1

Answers

2

The PowerShell syntax is rather straightforward. This command opens a file in the default encoding and saves it as UTF-8 with BOM:

Get-Content <SrcFile.txt> -Encoding Oem | Out-File <DestFile.txt> -Encoding utf8

The Encoding parameter accepts the following: Ascii, BigEndianUnicode, BigEndianUTF32, Byte, Default, Oem, String, Unicode, Unknown, UTF32, UTF7, UTF8

user477799

Posted 2017-04-20T07:07:53.257

Reputation:

1

Get-Content might be not optimal as it handles the input file line by line (at least, by default, if you don't use the Raw switch as described later), and may cause changing the line ending (for example, if you move text files between Unix and Windows systems). I had serious problems in a script just because that, and it took about an hour to find the exact reason. See more about that in this post. Due to this behavior, Get-Content is not the best choice as well, if performance matters.

Instead of this, you can use PowerShell in combination of the .NET classes (as long you have a version of the .NET Framework installed on your system):

$sr = New-Object System.IO.StreamReader($infile) 
$sw = New-Object System.IO.StreamWriter($outfile, $false, [System.Text.Encoding]::Default)

$sw.Write($sr.ReadToEnd())

$sw.Close()
$sr.Close() 
$sw.Dispose()
$sr.Dispose()

Or even more simply, use the Raw switch as described here to avoid that overhead and read the text in a single block:

Get-Content $inFile -Raw

pholpar

Posted 2017-04-20T07:07:53.257

Reputation: 119

You've initialized your StreamsReader and StreamWriter with wrong encoding. – None – 2018-03-07T15:46:23.693