StreamWriter and UTF-8 Byte Order Marks

2019-01-07 17:44发布

I'm having an issue with StreamWriter and Byte Order Marks. The documentation seems to state that the Encoding.UTF8 encoding has byte order marks enabled but when files are being written some have the marks while other don't.

I'm creating the stream writer in the following way:

this.Writer = new StreamWriter(this.Stream, System.Text.Encoding.UTF8);

Any ideas on what could be happening would be appreciated.

8条回答
三岁会撩人
2楼-- · 2019-01-07 18:00

I found this answer useful (thanks to @Philipp Grathwohl and @Nik), but in my case I'm using FileStream to accomplish the task, so, the code that generates the BOM goes like this:

using (FileStream vStream = File.Create(pfilePath))
{
    // Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
    Encoding vUTF8Encoding = new UTF8Encoding(true);
    // Gets the preamble in order to attach the BOM
    var vPreambleByte = vUTF8Encoding.GetPreamble();

    // Writes the preamble first
    vStream.Write(vPreambleByte, 0, vPreambleByte.Length);

    // Gets the bytes from text
    byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
    vStream.Write(vByteData, 0, vByteData.Length);
    vStream.Close();
}
查看更多
SAY GOODBYE
3楼-- · 2019-01-07 18:01

Could you please show a situation where it don't produce it ? The only case where the preamble isn't present that I can find is when nothing is ever written to the writer (Jim Mischel seem to have find an other, logical and more likely to be your problem, see it's answer).

My test code :

var stream = new MemoryStream();
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write('a');
}
Console.WriteLine(stream.ToArray()
    .Select(b => b.ToString("X2"))
    .Aggregate((i, a) => i + " " + a)
    );
查看更多
时光不老,我们不散
4楼-- · 2019-01-07 18:06

The issue is due to the fact that you are using the static UTF8 property on the Encoding class.

When the GetPreamble method is called on the instance of the Encoding class returned by the UTF8 property, it returns the byte order mark (the byte array of three characters) and is written to the stream before any other content is written to the stream (assuming a new stream).

You can avoid this by creating the instance of the UTF8Encoding class yourself, like so:

// As before.
this.Writer = new StreamWriter(this.Stream, 
    // Create yourself, passing false will prevent the BOM from being written.
    new System.Text.UTF8Encoding());

As per the documentation for the default parameterless constructor (emphasis mine):

This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.

This means that the call to GetPreamble will return an empty array, and therefore no BOM will be written to the underlying stream.

查看更多
再贱就再见
5楼-- · 2019-01-07 18:09

My answer is based on HelloSam's one which contains all the necessary information. Only I believe what OP is asking for is how to make sure that BOM is emitted into the file.

So instead of passing false to UTF8Encoding ctor you need to pass true.

    using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))

Try the code below, open the resulting files in a hex editor and see which one contains BOM and which doesn't.

class Program
{
    static void Main(string[] args)
    {
        const string nobomtxt = "nobom.txt";
        File.Delete(nobomtxt);

        using (Stream stream = File.OpenWrite(nobomtxt))
        using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
        {
            writer.WriteLine("HelloПривет");
        }

        const string bomtxt = "bom.txt";
        File.Delete(bomtxt);

        using (Stream stream = File.OpenWrite(bomtxt))
        using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
        {
            writer.WriteLine("HelloПривет");
        }
    }
查看更多
我只想做你的唯一
6楼-- · 2019-01-07 18:10

Do you use the same constructor of the StreamWriter for every file? Because the documentation says:

To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding).

I was in a similar situation a while ago. I ended up using the Stream.Write method instead of the StreamWriter and wrote the result of Encoding.GetPreamble() before writing the Encoding.GetBytes(stringToWrite)

查看更多
孤傲高冷的网名
7楼-- · 2019-01-07 18:12

As someone pointed that out already, calling without the encoding argument does the trick. However, if you want to be explicit, try this:

using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))

The key is to construct a new UTF8Encoding(false), instead of using Encoding.UTF8Encoding. That's to control if BOM should be added or not.

This is the same as calling StreamWriter without the encoding argument, internally it's just doing the same thing.

查看更多
登录 后发表回答