i have a very simple question I can't seem to get my head around.
I have a properly encoded UTF8-String I parse into a JObject with Json.NET, fiddle around with some values and write it to the commandline, keeping the encoded characters intact.
Everything works great except for the keeping the encoded characters intact part.
Code:
var json = "{roster: [[\"Tulg\u00f4r\", 990, 1055]]}";
var j = JObject.Parse(json);
for (int i = 0; i < j["roster"].Count(); i++)
{
j["roster"][i][1] = ((int)j["roster"][i][1]) * 3;
j["roster"][i][2] = ((int)j["roster"][i][2]) * 3;
}
Console.WriteLine(JsonConvert.SerializeObject(j, Formatting.None));
Actual Output:
{"roster":[["Tulgôr",2970,3165]]}
Desired Output:
{"roster":[["Tulg\u00f4r",2970,3165]]}
It seems like my phrasing in Google is inappropriate since nothing useful came up. I'm sure it's something uber-easy and i will feel pretty stupid afterwards. :)
Take the output from JsonConvert.SerializeObject and run it through a helper method that converts all non-ASCII characters to their escaped ("\uHHHH") equivalent. A sample implementation is given below.
// Replaces non-ASCII with escape sequences;
// i.e., converts "Tulgôr" to "Tulg\u00f4r".
private static string EscapeUnicode(string input)
{
StringBuilder sb = new StringBuilder(input.Length);
foreach (char ch in input)
{
if (ch <= 0x7f)
sb.Append(ch);
else
sb.AppendFormat(CultureInfo.InvariantCulture, "\\u{0:x4}", (int) ch);
}
return sb.ToString();
}
You would call it as follows:
Console.WriteLine(EscapeUnicode(JsonConvert.SerializeObject(j, Formatting.None)));
(Note that I don't handle non-BMP characters specially, because I don't know if your third-party application wants "\U00010000" or "\uD800\uDC00" (or something else!) when representing U+10000.)
I'm not sure I see the problem here. The actual output contains the unicode character, it is being interpreted correctly after being specified using \u syntax. It contains the correct character, so contains the correct "bytes". Of course it will be a .Net string so Unicode, rather than UTF-8.