Add byte order mark to a string via StringBuilder

2019-06-27 05:13发布

How can I add a byte order mark to a StringBuilder? (I have to pass a string to another method which will save it as a file, but I can't modify that method).

I tried this:

var sb = new StringBuilder();
sb.Append('\xEF');
sb.Append('\xBB');
sb.Append('\xBF');

But when I view it with hex editor, it adds the following sequence: C3 AF C2 BB C2 BF

The string is huge, so it would be good to do it without back and forth converting to byte array.

Edit: Clarification after questions in comments. I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. I can't modify the other method.

3条回答
我只想做你的唯一
2楼-- · 2019-06-27 05:58

Two options:

  1. Don't include the byte order mark in your text at all... instead use an encoding which will automatically include it
  2. Include it as a character in your StringBuilder:

    sb.Append('\uFEFF'); // U+FEFF is the byte-order mark character
    

Personally I'd go for the first approach normally, but the "I can't modify that method" suggests it may not be an option in your case.

查看更多
叛逆
3楼-- · 2019-06-27 05:58

Byte-order marks are to inform readers of a file that the file is of a particular encoding. As such, you should only need the byte-order marks (BOM) in the actual file. If you want to include BOM in a text file you're writing, simply use StreamWriter to write to the file. For example:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write(sb.ToString);
}

If you don't want BOM with UTF-8:

using(var writer = new StreamWriter(stream))
{
    writer.Write(sb.ToString());
}

Or if you want different BOM:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF16))
{
    writer.Write(sb.ToString);
}

Update:

If you wanted to be coupled from the implementation detail of a BOM or a BOM of a particular encoding (i.e. could change at runtime or after deployment) but still wanted to pass a BOM-marked string, you could do something like this (assumes .NET 4.5):

var stream = new MemoryStream();
var encoding = Encoding.UTF8; // TODO: configurize this, if necessary
using(var writer = new StreamWriter(stream, encoding, 1024, true))
{
    writer.Write(sb.ToString());
}
CantModifyButMustUseThis(encoding.GetString(stream.ToArray());
查看更多
别忘想泡老子
4楼-- · 2019-06-27 05:58

IIRC (and not certain that I do), BOM gets added when you convert to byte using one of the relevant Unicode Encoders. I believe some of those's constructors take a bool that control if to add BOM.

查看更多
登录 后发表回答