c#, Excel + csv: how to get the correct encoding?

2019-01-27 14:42发布

I've been trying this for quite a while now, but can't figure it out. I'm trying to export data to Excel via a *.csv file. It works great so far, but I have some encoding problems when opening the files in Excel.

(original string on the left, EXCEL result on the right):

Messwert(µm / m) ==> Messwert(µm / m)

Dümme Mässöng ==> Dümme Mässöng

Notepad++ tells me that the file is encoded "ANSI as UTF8"(WTF?)

So here are different ways I tried to get a valid result: obvious implementation:

tWriter.Write(";Messwert(µm /m)");

more sophisticated one (tried probably a dozen or more encoding combinations:)

tWriter.Write(Encoding.Default.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));
tWriter.Write(Encoding.ASCII.GetString(Encoding.Unicode.GetBytes(";Messwert(µm /m)")));

and so on

Whole source code for the method creating the data:

    MemoryStream tStream = new MemoryStream();
    StreamWriter tWriter = new StreamWriter(tStream);
    tWriter.Write("\uFEFF");

    tWriter.WriteLine(string.Format("{0}", aMeasurement.Name));
    tWriter.WriteLine(aMeasurement.Comment);
    tWriter.WriteLine();
    tWriter.WriteLine("Zeit in Minuten;Messwert(µm / m)");

    TimeSpan tSpan;
    foreach (IMeasuringPoint tPoint in aMeasurement)
    {
        tSpan = new TimeSpan(tPoint.Time - aMeasurement[0].Time);
        tWriter.WriteLine(string.Format("{0};{1};", (int)tSpan.TotalMinutes, getMPString(tPoint)));
    }

    tWriter.Flush();
    return tStream;

Generated CSV file:

Dümme Mössäng
Testmessung die erste

Zeit in Minuten;Messwert(µm / m)
0;-703;
0;-381;
1;1039;
1;1045;
2;1457;
2;1045;

7条回答
走好不送
2楼-- · 2019-01-27 15:25

I'd suggest you open up the text file in a hex editor, and see what it really is. The BOM for UTF-16 is 0xFEFF, which the writing code is apparently writing to the stream - but the rest of the writing doesn't specify an encoding to use - it would use the default encoding of the StreamWriter, which is UTF-8. There appears to be a mix up of encodings.

When you pop open the file in hex view, if you see lots of 0x00 between the characters, you're working with UTF-16, which is Encoding.Unicode in C#. If there are no 0x00 between chars, the encoding is probably UTF-8.

If the latter case, just fix up the BOM to be EF BB BF rather than FE FF, and read normally with UTF-8 encoding.

查看更多
登录 后发表回答