Character-encoding problem with string literal in

2019-05-29 14:00发布

问题:

$logstring = Invoke-Command -ComputerName $filesServer   -ScriptBlock {
        param(
            $logstring,
            $grp
        )

    $Klassenbuchordner = "KB " + $grp.Gruppe
    $Gruppenordner = $grp.Gruppe
    $share = $grp.Gruppe
    $path = "D:\Gruppen\$Gruppenordner"

    if ((Test-Path D:\Dozenten\01_Klassenbücher\$Klassenbuchordner) -eq $true)
    {$logstring += "Verzeichnis für Klassenbücher existiert bereits"}
    else {
        mkdir D:\Dozenten\01_Klassenbücher\$Klassenbuchordner
        $logstring += "Klassenbuchordner wurde erstellt!"
    }} -ArgumentList $logstring, $grp

My goal is to test the existence of a directory and create it on demand.

The problem is that the path contains German letters (umlauts), which aren't seen correctly by the target server.

For instance, the server receives path "D:\Dozent\01_Klassenbücher" instead of the expected "D:\Dozent\01_Klassenbücher".

How can I force proper UTF-8 encoding?

回答1:

Note: Remoting and use of Invoke-Command are incidental to your problem.

Since the problem occurs with a string literal in your source code (...\01_Klassenbücher\...), the likeliest explanation is that your script file is misinterpreted by PowerShell.

In Windows PowerShell (as opposed to PowerShel Core), if your script file is de facto UTF-8-encoded but lacks a BOM, PowerShell will misinterpret any non-ASCII-range characters (such as ü) in the script.[1]

Therefore: Re-save your script as UTF-8 with BOM.


Why you should save your scripts as UTF-8 with BOM:

Visual Studio Code and other modern editors create UTF-8 files without BOM by default, which is what causes the problem in Windows PowerShell.

By contrast, the PowerShell ISE creates "ANSI"-encoded[1] files, which Windows PowerShell - but not PowerShell Core - reads correctly.

You can only get away with "ANSI"-encoded files:

  • if your scripts will never be run in PowerShell Core - where all future development effort will go.

  • if your scripts will never run on a machine where a different "ANSI" code page is in effect.

  • if your script doesn't contain characters - e.g., emoji - that cannot be represented with your "ANSI" code page.

Given these limitations, it's safest - and future-proof - to always create PowerShell scripts as UTF-8 with BOM.
(Alternatively, you can use UTF-16 (which is always saved with a BOM), but that bloats the file size if you're primarily using ASCII/"ANSI"-range characters, which is likely in PS scripts).


How to make Visual Studio Code create UTF-8 files with-BOM for PowerShell scripts by default:

Note: The following is still required as of v1.11.0 of the PowerShell extension for VSCode, but not that there's a suggestion to make the extension default PowerShell files to UTF-8 with BOM on GitHub.

Add the following to your settings.json file (from the command palette (Ctrl+Shift+P, type settings and select Preferences: Open Settings (JSON)):

"[powershell]": {
  "files.encoding": "utf8bom"
}

Note that the setting is intentionally scoped to PowerShell files only, because you wouldn't want all files to default to UTF-8 with BOM, given that many utilities on Unix platforms neither expect nor know how to handle such a BOM.


[1] In the absence of a BOM, Windows PowerShell defaults to the encoding of the system's current "ANSI" code page, as determined by the legacy system locale; e.g., in Western European cultures, Windows-1252.