Fastest way to convert a possibly-null-terminated

2019-03-18 03:58发布

问题:

I need to convert a (possibly) null terminated array of ascii bytes to a string in C# and the fastest way I've found to do it is by using my UnsafeAsciiBytesToString method shown below. This method uses the String.String(sbyte*) constructor which contains a warning in it's remarks:

"The value parameter is assumed to point to an array representing a string encoded using the default ANSI code page (that is, the encoding method specified by Encoding.Default).

Note: * Because the default ANSI code page is system-dependent, the string created by this constructor from identical signed byte arrays may differ on different systems. * ...

* If the specified array is not null-terminated, the behavior of this constructor is system dependent. For example, such a situation might cause an access violation. * "

Now, I'm positive that the way the string is encoded will never change... but the default codepage on the system that my app is running on might change. So, is there any reason that I shouldn't run screaming from using String.String(sbyte*) for this purpose?

using System;
using System.Text;

namespace FastAsciiBytesToString
{
    static class StringEx
    {
        public static string AsciiBytesToString(this byte[] buffer, int offset, int maxLength)
        {
            int maxIndex = offset + maxLength;

            for( int i = offset; i < maxIndex; i++ )
            {
                /// Skip non-nulls.
                if( buffer[i] != 0 ) continue;
                /// First null we find, return the string.
                return Encoding.ASCII.GetString(buffer, offset, i - offset);
            }
            /// Terminating null not found. Convert the entire section from offset to maxLength.
            return Encoding.ASCII.GetString(buffer, offset, maxLength);
        }

        public static string UnsafeAsciiBytesToString(this byte[] buffer, int offset)
        {
            string result = null;

            unsafe
            {
                fixed( byte* pAscii = &buffer[offset] )
                { 
                    result = new String((sbyte*)pAscii);
                }
            }

            return result;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            byte[] asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c', 0, 0, 0 };

            string result = asciiBytes.AsciiBytesToString(3, 6);

            Console.WriteLine("AsciiBytesToString Result: \"{0}\"", result);

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            /// Non-null terminated test.
            asciiBytes = new byte[]{ 0, 0, 0, (byte)'a', (byte)'b', (byte)'c' };

            result = asciiBytes.UnsafeAsciiBytesToString(3);

            Console.WriteLine("UnsafeAsciiBytesToString Result: \"{0}\"", result);

            Console.ReadLine();
        }
    }
}

回答1:

Any reason not to use the String(sbyte*, int, int) constructor? If you've worked out which portion of the buffer you need, the rest should be simple:

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset, int length)
{
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, length);
       }
    }
}

If you need to look first:

public static string UnsafeAsciiBytesToString(byte[] buffer, int offset)
{
    int end = offset;
    while (end < buffer.Length && buffer[end] != 0)
    {
        end++;
    }
    unsafe
    {
       fixed (byte* pAscii = buffer)
       { 
           return new String((sbyte*)pAscii, offset, end - offset);
       }
    }
}

If this truly is an ASCII string (i.e. all bytes are less than 128) then the codepage problem shouldn't be an issue unless you've got a particularly strange default codepage which isn't based on ASCII.

Out of interest, have you actually profiled your application to make sure that this is really the bottleneck? Do you definitely need the absolute fastest conversion, instead of one which is more readable (e.g. using Encoding.GetString for the appropriate encoding)?



回答2:

Oneliner (assuming the buffer actually contains ONE well formatted null terminated string):

String MyString = Encoding.ASCII.GetString(MyByteBuffer).TrimEnd((Char)0);


回答3:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestProject1
{
    class Class1
    {
    static public string cstr_to_string( byte[] data, int code_page)
    {
        Encoding Enc = Encoding.GetEncoding(code_page);  
        int inx = Array.FindIndex(data, 0, (x) => x == 0);//search for 0
        if (inx >= 0)
          return (Enc.GetString(data, 0, inx));
        else 
          return (Enc.GetString(data)); 
    }

    }
}


回答4:

s = s.Substring(0, s.IndexOf((char) 0));


回答5:

One possibility to consider: check that the default code-page is acceptable and use that information to select the conversion mechanism at run-time.

This could also take into account whether the string is in fact null-terminated, but once you've done that, of course, the speed gains my vanish.



回答6:

I'm not sure of the speed, but I found it easiest to use LINQ to remove the nulls before encoding:

string s = myEncoding.GetString(bytes.TakeWhile(b => !b.Equals(0)).ToArray());


回答7:

An easy / safe / fast way to convert byte[] objects to strings containing their ASCII equivalent and vice versa using the .NET class System.Text.Encoding. The class has a static function that returns an ASCII encoder:

From String to byte[]:

string s = "Hello World!"
byte[] b = System.Text.Encoding.ASCII.GetBytes(s);

From byte[] to string:

byte[] byteArray = new byte[] {0x41, 0x42, 0x09, 0x00, 0x255};
string s = System.Text.Encoding.ASCII.GetString(byteArray);


回答8:

This is a bit ugly but you don't have to use unsafe code:

string result = "";
for (int i = 0; i < data.Length && data[i] != 0; i++)
   result += (char)data[i];