Convert text files recursively to UTF-8 in PowerShell

7

6

I have a folder with text files which includes other folders in it, and these also contain some text files. I need to recursively convert all these files to UTF-8 encoding in PowerShell and preserve the folder structure during this process. I have tried this:

foreach( $i in get-childitem -recurse -name ) {
    get-content $i | out-file -encoding utf8 -filepath some_folder/$i
}

But it doesn't work, it can't reproduce the hierarchy of folders. How do I cope with this problem?

Roman

Posted 2012-03-07T12:33:14.793

Reputation: 183

1Which version of PowerShell is this? In the one that came with (my copy of) Win7 (v2?), there's Get-ChildItem but not Get-Children... – Bob – 2012-03-07T13:12:55.777

It was a misspelling, I've corrected it. – Roman – 2012-03-07T13:35:36.697

Answers

13

Try this one.

foreach($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")
    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    get-content $i | out-file -encoding utf8 -filepath $dest
}

It grabs the full path of the file and replaces the current directory with the one you want. For example, you run this command in the directory C:\1\ ($PWD = C:\1\). If it finds the file C:\1\2\file.txt, it'll give you a $dest of some_folder\2\file.txt.

The first if block is there so you don't try to convert a directory.

The directories have to be created if they don't already exist - I originally forgot that.


If you want UTF8 without BOM, replace the get-content $i | out-file -encoding utf8 -filepath $dest line with the following (source):

$filecontents = Get-Content $i
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
[System.IO.File]::WriteAllLines($i, $filecontents, $Utf8NoBomEncoding)

Note that this may not be very performant for larger files, since it reads entire files into memory before writing them again. If efficiency is needed, it is possible to read line by line, or even a specific number of bytes at a time. However, I'd rather just write a quick program in C# by that point (since you'd be using .NET fuctions in PS anyway).

Bob

Posted 2012-03-07T12:33:14.793

Reputation: 51 526

It does not work for me. It says there is an error in line 6 ($dest etc...). It tries to call a method with a NULL argument and its wrong. – Roman – 2012-03-07T13:58:16.247

I haven't used PowerShell enough, apparently.. the quotes were not necessary. And the directories must be created if they don't already exist. I've edited it to fix that, and actually tested it this time (more than just echoing the paths). Not sure how you got that error, though, what did you replace some_folder with? – Bob – 2012-03-07T14:30:27.370

@Roman, You need to define $PWD, else $PWD is NULL. If your files are stored in "X:\txt_Files" and you want to store the converted files in "X:\UTF_Files", then add: $PWD = "X:\txt_Files" and replace "some_folder" with "X:\UTF_Files" – Martin – 2013-07-11T17:12:32.540

@Martin $PWD is supposed to automatically refer to the current working directory. "some_folder" does have to be replaced with the appropriate folder. – Bob – 2013-07-20T04:49:15.867

@Bob It worked as expected, except the fact it throws some errors about some files that don't exists, because it tries to find the file on a invalid location. What if I want to convert files to UTF-8 without BOM? What do I have to add to this script? – darksoulsong – 2013-09-08T13:26:38.423

@darksoulsong You'll have to clarify what you mean by invalid location, though it should be possible to add a check to prevent visible errors. As for no BOM, replace the out-file stuff (basically, what's after get-content $i |) with this

– Bob – 2013-09-08T15:12:30.260

@Bob Thx Bob. Sry, but I have no xp with PSHELL scripting, so I think I'm missing something. I tried those solutions and it didn't works. I've put [System.IO.File]::WriteAllLines($MyPath, $dest, $Utf8NoBomEncoding) and an error complayning about the pipe showed up. If I remove the pipe, the script runs but another error occurs: it says "$MyPath" is null. And if I use "$i" instead of "$MyPath" it creates the folders only, inside "some_folder". If you got the time, you could update your answer with a "WITHOUT BOM" solution. What do you think? =) – darksoulsong – 2013-09-08T17:01:40.667

@darksoulsong whoops, my mistake - that function doesn't take a pipe. updated (but untested - I don't have access to a Windows machine right now) – Bob – 2013-09-08T17:15:54.340

@Bob Thanks! But this time it shows: "Exception on calling 'WritingAllLines' with 3 parameters: 'Value can't be null'". About the invalid location I told you, PowerShell complains about Get-Content. It says that the current file doesn't exists. In fact the file path is incorrect, it is trying to find the file in the root directory, that is, where I saved the .ps1 file. Maybe I should include the full path but I'm not sure how to do that. – darksoulsong – 2013-09-08T17:34:45.853

1

  • Allows for files and folders
  • File extension agnostic
  • Overwrites original file if destination equals the path
  • Encoding as a parameter

Usage: &"TextEncoding.ps1" -path "c:\windows\temps\folder1" -encoding "UTF8"

Here is the script I created:

[CmdletBinding()]
param(  
    [Parameter(Mandatory=$true)]
    [string]$path,
    [Parameter(Mandatory=$false)]
    [string]$dest = $path,
    [Parameter(Mandatory=$true)]
    [string]$encoding
)

function Set-Encoding(){

    #ensure it is a valid path
    if(-not(Test-Path -Path $path)){

        throw "File or directory not found at {0}" -f $path
    }

    #if the path is a file, else a directory
    if(Test-Path $path -PathType Leaf){

        #if the provided path equals the destination
        if($path -eq $dest){

            #get file extension
            $ext = [System.IO.Path]::GetExtension($path)

            #create destination
            $dest = $path.Replace([System.IO.Path]::GetFileName($path), ("temp_encoded{0}" -f $ext))

            #output to file with encoding
            Get-Content $path | Out-File -FilePath $dest -Encoding $encoding -Force

            #copy item to original path to overwrite (note move-item loses encoding)
            Copy-Item -Path $dest -Destination $path -Force -PassThru | ForEach-Object { Write-Output -inputobject ("{0} encoded {1}" -f $encoding, $_) }

            #remove the extra file
            Remove-Item $dest   

        }else{

            #output to file with encoding
            Get-Content $path | Out-File -FilePath $dest -Encoding $encoding -Force     

        }

    }else{

        #get all the files recursively
        foreach($i in Get-ChildItem -Path $path -Recurse) {


            if ($i.PSIsContainer) {
                continue
            }

            #get file extension
            $ext = [System.IO.Path]::GetExtension($i)

            #create destination
            $dest = "$path\temp_encoded{0}" -f $ext

            #output to file with encoding
            Get-Content $i.FullName | Out-File -FilePath $dest -Encoding $encoding -Force

            #copy item to original path to overwrite (note move-item loses encoding)
            Copy-Item -Path $dest -Destination $i.FullName -Force -PassThru | ForEach-Object { Write-Output -inputobject ("{0} encoded {1}" -f $encoding, $_) }

            #remove the extra file
            Remove-Item $dest

        }

    }

}

Set-Encoding

JDennis

Posted 2012-03-07T12:33:14.793

Reputation: 111

/TextEncoding.ps1 : File .\enc.ps1 cannot be loaded because running cripts is disabled on this system. For more information, see about_Execution_Policies at ttp://go.microsoft.com/fwlink/?LinkID=135170. t line:1 char:1 ./TextEncoding..ps1 -path "...\app\models" ...

   + CategoryInfo          : SecurityError: (:) [], PSSecurityException
   + FullyQualifiedErrorId : UnauthorizedAccess
 – Junaid Atari  – 2019-11-15T17:41:09.053