PowerShell - Batch change files encoding To UTF-8

2020-02-12 03:57发布

问题:

I'm trying to do a dead simple thing: to change files encoding from anything to UTF-8 without BOM. I found several scripts that do this and the only one that really worked for me is this one: https://superuser.com/questions/397890/convert-text-files-recursively-to-utf-8-in-powershell#answer-397915.

It worked as expected, but I need the generated files without BOM. So I tried to modify the script a little, adding the solution given to this question: Using PowerShell to write a file in UTF-8 without the BOM

This is my final script:

foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    get-content $i | out-file -encoding $Utf8NoBomEncoding -filepath $dest
}

The problem is that powershell is returning me an error, regarding the System.Text.UTF8Encoding($False) line, complaining about an incorrect parameter:

It is not possible to validate the argument on the 'Encoding' parameter. The argument "System.Text.UTF8Encoding" dont belongs to the the group "unicode, utf7, utf8, utf32, ascii" specified by the ValidateSet attribute.

I wonder if I'm missing something, like powershell version or something like that. I never coded a Powershell script before, so I'm totally lost with this. And I need to change these files encoding, there are hundreds of them, I wouldn't like to do it myself one by one.

Actually I'm using the 2.0 version that comes with Windows 7.

Thanks in advance!

EDIT 1

I tried the following code, suggested by @LarsTruijens and other posts:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i
    [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
}

This gives me an Exception, complaining about one of the parameters for WriteAllLines: "Exception on calling 'WriteAllLines' with 3 arguments. The value can't be null". Parameter name: contents. The script creates all folders, though. But they are all empty.

EDIT 2

An interesting thing about this error is that the "content" parameter is not null. If I output the value of the $content variable (using Write-host) the lines are there. So why it becomes null when passed to WriteAllLines method?

EDIT 3

I've added a content check to the variable, so the script now looks like this:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
    }
    else {
        Write-Host "No content from: $i"
    }
}

Now every iteration returns "No content from: $i" message, but the file isn't empty. There is one more error: Get-content: can't find the path 'C:\root\FILENAME.php' because it doesn't exists. It seems that it is trying to find the files at the root directory and not in the subfolders. It appears to be able to get the filename from child folders, but tries to read it from root.

EDIT 4 - Final Working Version

After some struggling and following the advices I got here, specially from @LarsTruijens and @AnsgarWiechers, I finally made it. I had to change the way I was getting the directory from $PWD and set some fixed names for the folders. After that, it worked perfectly.

Here it goes, for anyone who might be interested:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
$source = "path"
$destination = "some_folder"

foreach ($i in Get-ChildItem -Recurse -Force) {
    if ($i.PSIsContainer) {
        continue
    }

    $path = $i.DirectoryName -replace $source, $destination
    $name = $i.Fullname -replace $source, $destination

    if ( !(Test-Path $path) ) {
        New-Item -Path $path -ItemType directory
    }

    $content = get-content $i.Fullname

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
    } else {
        Write-Host "No content from: $i"   
    }
}

回答1:

You didn't follow the whole answer in here. You forgot the WriteAllLines part.

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i 
    [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
}


回答2:

Half of the answer is in the error message. It tells you the possible values the Encoding parameter accepts, one of them is utf8.

... out-file -encoding utf8


回答3:

  1. Goto the Dir you want cd c:\MyDirectoryWithCrazyCharacterEncodingAndUnicode
  2. Fire this script away!

Copy and past the script in your Powershell windows

 foreach($FileNameInUnicodeOrWhatever in get-childitem)
 {
    $FileName = $FileNameInUnicodeOrWhatever.Name

    $TempFile = "$($FileNameInUnicodeOrWhatever.Name).ASCII"

    get-content $FileNameInUnicodeOrWhatever | out-file $FileNameInUnicodeOrWhatever -Encoding ASCII 

    remove-item $FileNameInUnicodeOrWhatever

    rename-item $TempFile $FileNameInUnicodeOrWhatever

    write-output $FileNameInUnicodeOrWhatever "converted to ASCII ->" $TempFile
}


回答4:

I've made some fixes

  • Get-Childitem acts on $source
  • replace does not try to interpret $source as regex
  • some resolve-path
  • auto-help

and packaged everything into a cmdlet:

<#
    .SYNOPSIS
        Encode-Utf8

    .DESCRIPTION
        Re-Write all files in a folder in UTF-8

    .PARAMETER Source
        directory path to recursively scan for files

    .PARAMETER Destination
        directory path to write files to 
#>
[CmdletBinding(DefaultParameterSetName="Help")]
Param(
   [Parameter(Mandatory=$true, Position=0, ParameterSetName="Default")]
   [string]
   $Source,

   [Parameter(Mandatory=$true, Position=1, ParameterSetName="Default")]
   [string]
   $Destination,

  [Parameter(Mandatory=$false, Position=0, ParameterSetName="Help")]
   [switch]
   $Help   
)

if($PSCmdlet.ParameterSetName -eq 'Help'){
    Get-Help $MyInvocation.MyCommand.Definition -Detailed
    Exit
}

if($PSBoundParameters['Debug']){
    $DebugPreference = 'Continue'
}

$Source = Resolve-Path $Source

if (-not (Test-Path $Destination)) {
    New-Item -ItemType Directory -Path $Destination -Force | Out-Null
}
$Destination = Resolve-Path $Destination

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)

foreach ($i in Get-ChildItem $Source -Recurse -Force) {
    if ($i.PSIsContainer) {
        continue
    }

    $path = $i.DirectoryName.Replace($Source, $Destination)
    $name = $i.Fullname.Replace($Source, $Destination)

    if ( !(Test-Path $path) ) {
        New-Item -Path $path -ItemType directory
    }

    $content = get-content $i.Fullname

    if ( $content -ne $null ) {
        [System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
    } else {
        Write-Host "No content from: $i"   
    }
}


回答5:

This approach creates the whole folder structure before copying the files into UTF-8 from the current directory . At the end we exchange parent directory names .

$destination = "..\DestinationFolder"
Remove-item $destination -Recurse -Force
robocopy $PWD $destination /e /xf *.*

foreach($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }
    $originalContent = $i.Fullname
    $dest = $i.Fullname.Replace($PWD, $destination)
    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }
    get-content $originalContent | out-file -encoding utf8 -filepath $dest
}


回答6:

I adapted few snipplets when I needed to UTF8 encode a massive amount of log-files.

Note! Should not be used with -recurse

write-host " "
$sourcePath = (get-location).path   # Use current folder as source.
# $sourcePath = "C:\Source-files"   # Use custom folder as source.
$destinationPath = (get-location).path + '\Out'   # Use "current folder\Out" as target.
# $destinationPath = "C:\UTF8-Encoded"   # Set custom target path

$cnt = 0

write-host "UTF8 convertsation from " $sourcePath " to " $destinationPath

if (!(Test-Path $destinationPath))

{
  write-host "(Note: target folder created!) "
  new-item -type directory -path $destinationPath -Force | Out-Null
}

Get-ChildItem -Path $sourcePath -Filter *.txt | ForEach-Object {
  $content = Get-Content $_.FullName
  Set-content (Join-Path -Path $destinationPath -ChildPath $_) -Encoding UTF8 -Value $content
  $cnt++
 }
write-host " "
write-host "Totally " $cnt " files converted!"
write-host " "
pause


回答7:

With:

 foreach ($i in Get-ChildItem -Path $source -Recurse -Force) {

Only Files in the Subfolder $source will be used.