How can I add a byte order mark to a StringBuilder?
(I have to pass a string to another method which will save it as a file, but I can't modify that method).
I tried this:
var sb = new StringBuilder();
sb.Append('\xEF');
sb.Append('\xBB');
sb.Append('\xBF');
But when I view it with hex editor, it adds the following sequence:
C3 AF C2 BB C2 BF
The string is huge, so it would be good to do it without back and forth converting to byte array.
Edit:
Clarification after questions in comments. I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. I can't modify the other method.
Two options:
- Don't include the byte order mark in your text at all... instead use an encoding which will automatically include it
Include it as a character in your StringBuilder
:
sb.Append('\uFEFF'); // U+FEFF is the byte-order mark character
Personally I'd go for the first approach normally, but the "I can't modify that method" suggests it may not be an option in your case.
Byte-order marks are to inform readers of a file that the file is of a particular encoding. As such, you should only need the byte-order marks (BOM) in the actual file. If you want to include BOM in a text file you're writing, simply use StreamWriter
to write to the file. For example:
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
writer.Write(sb.ToString);
}
If you don't want BOM with UTF-8:
using(var writer = new StreamWriter(stream))
{
writer.Write(sb.ToString());
}
Or if you want different BOM:
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF16))
{
writer.Write(sb.ToString);
}
Update:
If you wanted to be coupled from the implementation detail of a BOM or a BOM of a particular encoding (i.e. could change at runtime or after deployment) but still wanted to pass a BOM-marked string, you could do something like this (assumes .NET 4.5):
var stream = new MemoryStream();
var encoding = Encoding.UTF8; // TODO: configurize this, if necessary
using(var writer = new StreamWriter(stream, encoding, 1024, true))
{
writer.Write(sb.ToString());
}
CantModifyButMustUseThis(encoding.GetString(stream.ToArray());
IIRC (and not certain that I do), BOM gets added when you convert to byte using one of the relevant Unicode Encoders. I believe some of those's constructors take a bool that control if to add BOM.