UTF-8 output from PowerShell

2019-01-06 11:10发布

问题:

I'm trying to use Process.Start with redirected I/O to call PowerShell.exe with a string, and to get the output back, all in UTF-8. But I don't seem to be able to make this work.

What I've tried:

  • Passing the command to run via the -Command parameter
  • Writing the PowerShell script as a file to disk with UTF-8 encoding
  • Writing the PowerShell script as a file to disk with UTF-8 with BOM encoding
  • Writing the PowerShell script as a file to disk with UTF-16
  • Setting Console.OutputEncoding in both my console application and in the PowerShell script
  • Setting $OutputEncoding in PowerShell
  • Setting Process.StartInfo.StandardOutputEncoding
  • Doing it all with Encoding.Unicode instead of Encoding.UTF8

In every case, when I inspect the bytes I'm given, I get different values to my original string. I'd really love an explanation as to why this doesn't work.

Here is my code:

static void Main(string[] args)
{
    DumpBytes("Héllo");

    ExecuteCommand("PowerShell.exe", "-Command \"$OutputEncoding = [System.Text.Encoding]::UTF8 ; Write-Output 'Héllo';\"",
        Environment.CurrentDirectory, DumpBytes, DumpBytes);

    Console.ReadLine();
}

static void DumpBytes(string text)
{
    Console.Write(text + " " + string.Join(",", Encoding.UTF8.GetBytes(text).Select(b => b.ToString("X"))));
    Console.WriteLine();
}

static int ExecuteCommand(string executable, string arguments, string workingDirectory, Action<string> output, Action<string> error)
{
    try
    {
        using (var process = new Process())
        {
            process.StartInfo.FileName = executable;
            process.StartInfo.Arguments = arguments;
            process.StartInfo.WorkingDirectory = workingDirectory;
            process.StartInfo.UseShellExecute = false;
            process.StartInfo.CreateNoWindow = true;
            process.StartInfo.RedirectStandardOutput = true;
            process.StartInfo.RedirectStandardError = true;
            process.StartInfo.StandardOutputEncoding = Encoding.UTF8;
            process.StartInfo.StandardErrorEncoding = Encoding.UTF8;

            using (var outputWaitHandle = new AutoResetEvent(false))
            using (var errorWaitHandle = new AutoResetEvent(false))
            {
                process.OutputDataReceived += (sender, e) =>
                {
                    if (e.Data == null)
                    {
                        outputWaitHandle.Set();
                    }
                    else
                    {
                        output(e.Data);
                    }
                };

                process.ErrorDataReceived += (sender, e) =>
                {
                    if (e.Data == null)
                    {
                        errorWaitHandle.Set();
                    }
                    else
                    {
                        error(e.Data);
                    }
                };

                process.Start();

                process.BeginOutputReadLine();
                process.BeginErrorReadLine();

                process.WaitForExit();
                outputWaitHandle.WaitOne();
                errorWaitHandle.WaitOne();

                return process.ExitCode;
            }
        }
    }
    catch (Exception ex)
    {
        throw new Exception(string.Format("Error when attempting to execute {0}: {1}", executable, ex.Message),
            ex);
    }
}

Update 1

I found that if I make this script:

[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
Write-Host "Héllo!"
[Console]::WriteLine("Héllo")

Then invoke it via:

ExecuteCommand("PowerShell.exe", "-File C:\\Users\\Paul\\Desktop\\Foo.ps1",
  Environment.CurrentDirectory, DumpBytes, DumpBytes);

The first line is corrupted, but the second isn't:

H?llo! 48,EF,BF,BD,6C,6C,6F,21
Héllo 48,C3,A9,6C,6C,6F

This suggests to me that my redirection code is all working fine; when I use Console.WriteLine in PowerShell I get UTF-8 as I expect.

This means that PowerShell's Write-Output and Write-Host commands must be doing something different with the output, and not simply calling Console.WriteLine.

Update 2

I've even tried the following to force the PowerShell console code page to UTF-8, but Write-Host and Write-Output continue to produce broken results while [Console]::WriteLine works.

$sig = @'
[DllImport("kernel32.dll")]
public static extern bool SetConsoleCP(uint wCodePageID);

[DllImport("kernel32.dll")]
public static extern bool SetConsoleOutputCP(uint wCodePageID);
'@

$type = Add-Type -MemberDefinition $sig -Name Win32Utils -Namespace Foo -PassThru

$type::SetConsoleCP(65001)
$type::SetConsoleOutputCP(65001)

Write-Host "Héllo!"

& chcp    # Tells us 65001 (UTF-8) is being used

回答1:

This is a bug in .NET. When PowerShell launches, it caches the output handle (Console.Out). The Encoding property of that text writer does not pick up the value StandardOutputEncoding property.

When you change it from within PowerShell, the Encoding property of the cached output writer returns the cached value, so the output is still encoded with the default encoding.

As a workaround, I would suggest not changing the encoding. It will be returned to you as a Unicode string, at which point you can manage the encoding yourself.

Caching example:

102 [C:\Users\leeholm]
>> $r1 = [Console]::Out

103 [C:\Users\leeholm]
>> $r1

Encoding                                          FormatProvider
--------                                          --------------
System.Text.SBCSCodePageEncoding                  en-US



104 [C:\Users\leeholm]
>> [Console]::OutputEncoding = [System.Text.Encoding]::UTF8

105 [C:\Users\leeholm]
>> $r1

Encoding                                          FormatProvider
--------                                          --------------
System.Text.SBCSCodePageEncoding                  en-US


回答2:

Not an expert on encoding, but after reading these...

  • http://blogs.msdn.com/b/powershell/archive/2006/12/11/outputencoding-to-the-rescue.aspx
  • http://technet.microsoft.com/en-us/library/hh847796.aspx
  • http://www.johndcook.com/blog/2008/08/25/powershell-output-redirection-unicode-or-ascii/

... it seems fairly clear that the $OutputEncoding variable only affects data piped to native applications.

If sending to a file from withing PowerShell, the encoding can be controlled by the -encoding parameter on the out-file cmdlet e.g.

write-output "hello" | out-file "enctest.txt" -encoding utf8

Nothing else you can do on the PowerShell front then, but the following post may well help you:.

  • http://blogs.msdn.com/b/ddietric/archive/2010/11/08/decoding-standard-output-and-standard-error-when-redirecting-to-a-gui-application.aspx


回答3:

Set the [Console]::OuputEncoding as encoding whatever you want, and print out with [Console]::WriteLine.

If powershell ouput method has a problem, then don't use it. It feels bit bad, but works like a charm :)



回答4:

Spent some time working on a solution to my issue and thought it may be of interest. I ran into a problem trying to automate code generation using PowerShell 3.0 on Windows 8. The target IDE was the Keil compiler using MDK-ARM Essential Toolchain 5.24.1. A bit different from OP, as I am using PowerShell natively during the pre-build step. When I tried to #include the generated file, I received the error

fatal error: UTF-16 (LE) byte order mark detected '..\GITVersion.h' but encoding is not supported

I solved the problem by changing the line that generated the output file from:

out-file -FilePath GITVersion.h -InputObject $result

to:

out-file -FilePath GITVersion.h -Encoding ascii -InputObject $result