Is there actually any simple method of finding which encodings in .NET are ASCII-compatible?
(Based on the question posed in Nyerguds's comment.)
Is there actually any simple method of finding which encodings in .NET are ASCII-compatible?
(Based on the question posed in Nyerguds's comment.)
We will assume the standard definition of ASCII that is limited to 128 characters (namely, byte values whose most significant bit is 0). Unicode was designed such that its first 128 code points correspond to their ASCII equivalents. Since the numeric value of the char
structure in .NET corresponds to its Unicode code point (except for surrogates), we can define a utility method like so:
private static readonly byte[] asciiValues =
Enumerable.Range(0, 128).Select(b => (byte)b).ToArray();
private static readonly string asciiChars =
new string(asciiValues.Select(b => (char)b).ToArray());
public static bool IsAsciiCompatible(Encoding encoding)
{
try
{
return encoding.GetString(asciiValues).Equals(asciiChars, StringComparison.Ordinal)
&& encoding.GetBytes(asciiChars).SequenceEqual(asciiValues);
}
catch (ArgumentException)
{
// Encoding.GetString may throw DecoderFallbackException if a fallback occurred
// and DecoderFallback is set to DecoderExceptionFallback.
// Encoding.GetBytes may throw EncoderFallbackException if a fallback occurred
// and EncoderFallback is set to EncoderExceptionFallback.
// Both of these derive from ArgumentException.
return false;
}
}
We could then enumerate all .NET encodings like so:
var encodings = Encoding.GetEncodings().Select(e => e.GetEncoding()).ToList();
var asciiCompatible = encodings.Where(e => IsAsciiCompatible(e)).ToList();
var nonAsciiCompatbile = encodings.Except(asciiCompatible).ToList();
Console.WriteLine("ASCII compatible: ");
foreach (var encodingName in asciiCompatible.Select(e => e.EncodingName).OrderBy(n => n))
Console.WriteLine("* " + encodingName);
Console.WriteLine();
Console.WriteLine("Non-ASCII compatible: ");
foreach (var encodingName in nonAsciiCompatbile.Select(e => e.EncodingName).OrderBy(n => n))
Console.WriteLine("* " + encodingName);
Note that this method is not entirely safe. If there exists a multi-byte encoding that does fancy mappings of consecutive bytes or characters – such as decoding 0x61
to 'a'
and 0x62
to 'b'
(like in ASCII) but 0x6261
to "�"
– then this test would give incorrect results.
Running this on .NET Fiddle (snippet) gives the following results:
ASCII compatible:
Non-ASCII compatible: