Encoding UTF8 C# Process

I have an application which process the vbscript and produces the output.

private static string processVB(string command, string arguments)
{
    Process Proc = new Process();
    Proc.StartInfo.UseShellExecute = false;
    Proc.StartInfo.RedirectStandardOutput = true;
    Proc.StartInfo.RedirectStandardError = true;
    Proc.StartInfo.RedirectStandardInput = true;
    Proc.StartInfo.StandardOutputEncoding = Encoding.UTF8;
    Proc.StartInfo.StandardErrorEncoding = Encoding.UTF8;
    Proc.StartInfo.FileName = command;
    Proc.StartInfo.Arguments = arguments;
    Proc.StartInfo.WindowStyle = ProcessWindowStyle.Hidden; //prevent console      window from popping up
    Proc.Start();
    string output = Proc.StandardOutput.ReadToEnd();
    string error = Proc.StandardError.ReadToEnd();

    if (String.IsNullOrEmpty(output) && !String.IsNullOrEmpty(error))
    {
        output = error;
    }
    //Console.Write(ping_output);

    Proc.WaitForExit();
    Proc.Close();

    return output;
}

I think I have set everything related to Encoding property correct. processVB method will get command as VBscript file and its arguments.

The C# method processVB which is processing that VBScript file now producing the output as follows.

"��?"

But I should get original text

"äåéö€"

I have set Encoding correctly. But I am not able to get it right.

What am I doing wrong?

标签： c# encoding utf-8 process

5条回答

【Aperson】

2楼-- · 2019-04-08 09:07

The other process (vbscript) generates and output in some encoding. By setting the StandardOutputEncoding you tell the system how to read that stream. This will not change the Encoding made by the other process.

So you need to figure out the exact encoding used by the other process (VBScript). For that I'd run the script directly from the shell and redirect the output to a file and open it in an tool that shows the encoding (i.e. notepad2) And if I'm right that would be something other than UTF8.

Then you set the Proc.StartInfo.StandardOutputEncoding to that encoding in your code and then everything should work.

0人赞添加讨论(0) 举报

放荡不羁爱自由

3楼-- · 2019-04-08 09:10

This answer is not answering direct question - but I noticed a deadlock potential in your code and thus thought it would be worthy to post it anyhow.

The deadlock potential exists due to your code trying to do synchronous read from redirected output, and doing it for both, StdOut and StdErr. I.e. this section of the code.

Proc.Start();
string output = Proc.StandardOutput.ReadToEnd();
string error = Proc.StandardError.ReadToEnd();

...

Proc.WaitForExit();

What can happen is that child process writes a lot of data to StdErr and filling up the buffer. Once buffer gets filled up, the child process will block on the write to StdErr (without signaling yet end of StdOut stream). And so child is blocked and not doing anything, and your process is blocked waiting for child to exit. Deadlock!!!

To fix this, at least one (or better both) streams should be switched to asynchronous mode.

See second example in MSDN that talk specifically about this case scenario, and how to switch to asynchronous mode.

As for the UTF-8 issue, are you sure that your child process is outputting in this encoding and not say in UTF-16 or some other one? You may want to examine the bytes to try to reverse out what encoding stream is supplied in so you can set proper encoding for interpreting redirected stream.

EDIT

Here is how I think you can resolve the encoding issue. The basic idea is based on something that I once needed to do - I had Russian text in unknown encoding, and needed to figure out how to convert it so it shows proper characters - take the bytes captured from StdOut, and try to decode them using all known code pages available on the system. The one that looks right is likely (but not necessarily) the encoding that StdOut is encoded with. The reason it is not guaranteed to be the one even if it looks correct with your data is because many encoding have overlap over some ranges of bytes that would make it work the same. E.g. ASCII and UTF8 would have the same bytes when encoding basic Latin characters. So to get exact match, you may need to get creative and test with some atypical text.

Here is the basic code to do it - adjustments may be necessary:

    byte[] text = <put here bytes captured from StandardOut of child process>

    foreach(System.Text.EncodingInfo encodingInfo in System.Text.Encoding.GetEncodings())
    {
        System.Text.Encoding encoding = encodingInfo.GetEncoding();
        string decodedBytes = encoding.GetString(bytes);
        System.Console.Out.WriteLine("Encoding: {0}, Decoded Bytes: {1}", encoding.EncodingName, decodedBytes);
    }

Run the code and manually examine the output. All those that match the expected text are candidates for being the encoding used in StdOut.

0人赞添加讨论(0) 举报

你好瞎i

4楼-- · 2019-04-08 09:10

I am using your function like this:

label1.Text = processVB("wscript.exe", "c:\\s.vbs");

And my vbs file is

Set fso = CreateObject ("Scripting.FileSystemObject")
Set stdout = fso.GetStandardStream (1)
stdout.WriteLine "äåéö€"

My vbs file is encoded as UTF-8 without BOM

And it works as expected. I see äåéö€ on my form.

Maybe you should change the way how you use your function, encoding of your vbs file and how you output data to stdout.

0人赞添加讨论(0) 举报

ら.Afraid

5楼-- · 2019-04-08 09:18

The problem is that the console isn't UTF-8 by default. It runs in the same code page as your locale settings in Windows. A simple way to solve this is by using the chcp console command. Example:

chcp 65001 && yourScript.vbs

This will cause the output to be in UTF-8 and ensure that you can read it properly from your .NET application.

Note that I've tested this with a bat script instead of VB-script, but if VB-script does support UTF-8, it should work just fine. Also, you may have to explicitly call the VB-script execution engine instead of just yourScript.vbs. But you should be able to resolve this easily on your own :)

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

6楼-- · 2019-04-08 09:20

Because the output that VBScript generates is UTF8

That's the assumption that is getting you in trouble here, it just isn't utf-8. Nor can it be, the scripting engine doesn't support setting it. Something you can try for yourself, use this statement in a sample .vbs file:

 SetLocale 65001

Kaboom, it only accepts LCID values and they don't cover utf encodings. Instead, the cscript.exe scripting engine already changes the default code page itself. Instead of the default OEM code page (HKEY_LOCAL_MACHINE\SYSTEM\ControlSet\Control\Nls\CodePage\OEMCP value), it switches to the default Windows code page. The ACP value in the above documented registry key. Depends on your location, it will be 1252 for example in the Americas and Western Europe.

Some VBScript code to play with, be sure to save the file with the default encoding that's appropriate for your locale or the script interpreter itself will mis-interpret the strings in the source code. Which in itself can explain your problem as well:

WScript.Echo "Locale: " & GetLocale
WScript.Echo "äåéö€"
WScript.Echo "Changing locale to US-English:"
SetLocale 1033
WScript.Echo "äåéö€"

Output on my machine:

C:\temp>cscript test.vbs
Microsoft (R) Windows Script Host Version 5.8
Copyright (C) Microsoft Corporation. All rights reserved.

Locale: 1033
äåéö€
Changing locale to US-English:
äåéö€

So the proper line of code in your program should be:

Proc.StartInfo.StandardOutputEncoding = Encoding.Default;

Do note that this is not the default that the Process class uses, it will assume that a console mode program uses the OEM code page. Like 437 on a machine in Northern America and Western Europe. You can pick another LCID in your .vbs program and change your C# code to match but that should not be necessary.

And do keep the failure mode of having the .vbs source code file encoded wrong in mind. The scripting engine doesn't support utf-8 with a BOM either, unfortunately.

0人赞添加讨论(0) 举报

Encoding UTF8 C# Process

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间