Is there a way on C# that I can convert unicode strings into ASCII + html entities, and then back again? See, in PHP, I can do it like so:
<?php
// RUN ME AT COMMAND LINE
$sUnicode = '<b>Jöhan Strauß</b>';
echo "UNICODE: $sUnicode\n";
$sASCII = mb_convert_encoding($sUnicode, 'HTML-ENTITIES','UTF-8');
echo "ASCII: $sASCII\n";
$sUnicode = mb_convert_encoding($sASCII, 'UTF-8', 'HTML-ENTITIES');
echo "UNICODE (TRANSLATED BACK): $sUnicode\n";
Background:
- I need this to work in C# .NET 2 because we are constrained and can't use a higher .NET library in an older application.
- I handle the PHP backend on this application. I wanted to share some tips with the C# frontend team on this project.
HTML-ENTITIES
isn't really a character encoding even though the PHP API might hint so.<b>Jöhan Strauß</b>
is still UTF-8 encoded text (or even ASCII, ISO-8859-1, pretty much anything).I couldn't find anything premade, except html encoding functions which are not the same thing at all since they encode
&
,<
etc whilemb_convert_encoding
doesn't. I made t his class that should work:PHP:
C#
Note: This cannot handle characters outside BMP, though such a requirement is so rare that it should be explicitly mentioned.
Yes, there's
Encoding.Convert
, although I rarely use it myself:I rarely find I want to convert from one encoded form to another - it's much more common to perform a one way conversion from text to binary (
Encoding.GetBytes
) or vice versa (Encoding.GetString
).Here are some examples of conversions. The first two show how to convert, the last one can be turned into a function that takes two encoding names as strings, and convert.
A list of encoding names can be found here.