How to parse C# generic type names?

2020-02-14 02:21发布

问题:

How can I parse C#-style generic type names of the format List<int> or Dictionary<string,int> or even more complex Dictionary<string,Dictionary<System.String,int[]>>. Assume that these names are strings and may not actually represent existing types. It should just as easily be be able to parse BogusClass<A,B,Vector<C>>. To be clear, I am NOT interested in parsing .NET internal type names of the format List`1[[System.Int32]], but actual C# type names as they would appear in the source code, with or without namespace qualifiers using dot notation.

Regular expressions are out because these are nested structures. I thought perhaps the System.CodeDom.CodeTypeReference constructor would parse it for me since it has string BaseType and CodeTypeReferenceCollection TypeArguments members, but those apparently need to be set manually.

CodeTypeReference is the kind of structure I need:

class TypeNameStructure
{
    public string Name;
    public TypeNameStructure[] GenericTypeArguments;
    public bool IsGenericType{get;}
    public bool IsArray{get;} //would be nice to detect this as well

    public TypeNameStructure( string friendlyCSharpName )
    {
       //Parse friendlyCSharpName into name and generic type arguments recursively
    }
}

Are there any existing classes in the framework to achieve this kind of type name parsing? If not, how would I go about parsing this?

回答1:

Answering own question. I wrote the following class achieve the results I need; give it a spin.

public class TypeName
{
    public string Name;
    public bool IsGeneric;
    public List<ArrayDimension> ArrayDimensions;
    public List<TypeName> TypeArguments;

    public class ArrayDimension
    {
        public int Dimensions;

        public ArrayDimension()
        {
            Dimensions = 1;
        }

        public override string ToString()
        {
            return "[" + new String(',', Dimensions - 1) + "]";
        }
    }

    public TypeName()
    {
        Name = null;
        IsGeneric = false;
        ArrayDimensions = new List<ArrayDimension>();
        TypeArguments = new List<TypeName>();
    }

    public static string MatchStructure( TypeName toMatch, TypeName toType )
    {
        return null;
    }

    public override string ToString()
    {
        string str = Name;
        if (IsGeneric)
            str += "<" + string.Join( ",", TypeArguments.Select<TypeName,string>( tn => tn.ToString() ) ) + ">";
        foreach (ArrayDimension d in ArrayDimensions)
            str += d.ToString();
        return str;
    }

    public string FormatForDisplay( int indent = 0 )
    {
        var spacing = new string(' ', indent );
        string str = spacing + "Name: " + Name + "\r\n" +
        spacing + "IsGeneric: " + IsGeneric + "\r\n" +
        spacing + "ArraySpec: " + string.Join( "", ArrayDimensions.Select<ArrayDimension,string>( d => d.ToString() ) ) + "\r\n";
        if (IsGeneric)
        {
            str += spacing + "GenericParameters: {\r\n" + string.Join( spacing + "},{\r\n", TypeArguments.Select<TypeName,string>( t => t.FormatForDisplay( indent + 4 ) ) ) + spacing + "}\r\n";
        }
        return str;
    }

    public static TypeName Parse( string name )
    {
        int pos = 0;
        bool dummy;
        return ParseInternal( name, ref pos, out dummy );
    }

    private static TypeName ParseInternal( string name, ref int pos, out bool listTerminated )
    {
        StringBuilder sb = new StringBuilder();
        TypeName tn = new TypeName();
        listTerminated = true;
        while (pos < name.Length)
        {
            char c = name[pos++];
            switch (c)
            {
                case ',':
                    if (tn.Name == null)
                        tn.Name = sb.ToString();
                    listTerminated = false;
                    return tn;
                case '>':
                    if (tn.Name == null)
                        tn.Name = sb.ToString();
                    listTerminated = true;
                    return tn;
                case '<':
                {
                    tn.Name = sb.ToString();
                    tn.IsGeneric = true;
                    sb.Length = 0;
                    bool terminated = false;
                    while (!terminated)
                        tn.TypeArguments.Add( ParseInternal( name, ref pos, out terminated ) );
                    var t = name[pos-1];
                    if (t == '>')
                        continue;
                    else
                        throw new Exception( "Missing closing > of generic type list." );
                }
                case '[':
                    ArrayDimension d = new ArrayDimension();
                    tn.ArrayDimensions.Add( d );
                analyzeArrayDimension: //label for looping over multidimensional arrays
                    if (pos < name.Length)
                    {
                        char nextChar = name[pos++];
                        switch (nextChar)
                        {
                            case ']':
                                continue; //array specifier terminated
                            case ',': //multidimensional array
                                d.Dimensions++;
                                goto analyzeArrayDimension;
                            default:
                                throw new Exception( @"Expecting ""]"" or "","" after ""["" for array specifier but encountered """ + nextChar + @"""." );
                        }
                    }
                    throw new Exception( "Expecting ] or , after [ for array type, but reached end of string." );
                default:
                    sb.Append(c);
                    continue;
            }
        }
        if (tn.Name == null)
            tn.Name = sb.ToString();
        return tn;
    }
}

If I run the following:

 Console.WriteLine( TypeName.Parse( "System.Collections.Generic.Dictionary<Vector<T>,int<long[]>[],bool>" ).ToString() );

It correctly produces the following output, representing the TypeName as a string:

Name: System.Collections.Generic.Dictionary
IsGeneric: True
ArraySpec:
GenericParameters: {
    Name: Vector
    IsGeneric: True
    ArraySpec:
    GenericParameters: {
        Name: T
        IsGeneric: False
        ArraySpec:
    }
},{
    Name: int
    IsGeneric: True
    ArraySpec: []
    GenericParameters: {
        Name: long
        IsGeneric: False
        ArraySpec: []
    }
},{
    Name: bool
    IsGeneric: False
    ArraySpec:
}


回答2:

Well, I had a lot of fun writing this little parsing class using Regex and named capture groups (?<Name>group).

My approach was that each 'type definition' string could be broken up as a set of the following: Type Name, optional Generic Type, and optional array marker '[ ]'.

So given the classic Dictionary<string, byte[]> you would have Dictionary as the type name and string, byte[] as your inner generic type string.

We can split the inner generic type on the comma (',') character and recursively parse each type string using the same Regex. Each successful parse should be added to the parent type information and you can build a tree hierarchy.

With the previous example, we would end up with an array of {string, byte[]} to parse. Both of these are easily parsed and set to part of Dictionary's inner types.

On ToString() it's simply a matter of recursively outputting each type's friendly name, including inner types. So Dictionary would output his type name, and iterate through all inner types, outputting their type names and so forth.

class TypeInformation
{
    static readonly Regex TypeNameRegex = new Regex(@"^(?<TypeName>[a-zA-Z0-9_]+)(<(?<InnerTypeName>[a-zA-Z0-9_,\<\>\s\[\]]+)>)?(?<Array>(\[\]))?$", RegexOptions.Compiled);

    readonly List<TypeInformation> innerTypes = new List<TypeInformation>();

    public string TypeName
    {
        get;
        private set;
    }

    public bool IsArray
    {
        get;
        private set;
    }

    public bool IsGeneric
    {
        get { return innerTypes.Count > 0; }
    }

    public IEnumerable<TypeInformation> InnerTypes
    {
        get { return innerTypes; }
    }

    private void AddInnerType(TypeInformation type)
    {
        innerTypes.Add(type);
    }

    private static IEnumerable<string> SplitByComma(string value)
    {
        var strings = new List<string>();
        var sb = new StringBuilder();
        var level = 0;

        foreach (var c in value)
        {
            if (c == ',' && level == 0)
            {
                strings.Add(sb.ToString());
                sb.Clear();
            }
            else
            {
                sb.Append(c);
            }

            if (c == '<')
                level++;

            if(c == '>')
                level--;
        }

        strings.Add(sb.ToString());

        return strings;
    }

    public static bool TryParse(string friendlyTypeName, out TypeInformation typeInformation)
    {
        typeInformation = null;

        // Try to match the type to our regular expression.
        var match = TypeNameRegex.Match(friendlyTypeName);

        // If that fails, the format is incorrect.
        if (!match.Success)
            return false;

        // Scrub the type name, inner type name, and array '[]' marker (if present).
        var typeName = match.Groups["TypeName"].Value;
        var innerTypeFriendlyName = match.Groups["InnerTypeName"].Value;
        var isArray = !string.IsNullOrWhiteSpace(match.Groups["Array"].Value);

        // Create the root type information.
        TypeInformation type = new TypeInformation
        {
            TypeName = typeName,
            IsArray = isArray
        };

        // Check if we have an inner type name (in the case of generics).
        if (!string.IsNullOrWhiteSpace(innerTypeFriendlyName))
        {
            // Split each type by the comma character.
            var innerTypeNames = SplitByComma(innerTypeFriendlyName);

            // Iterate through all inner type names and attempt to parse them recursively.
            foreach (string innerTypeName in innerTypeNames)
            {
                TypeInformation innerType = null;
                var trimmedInnerTypeName = innerTypeName.Trim();
                var success = TypeInformation.TryParse(trimmedInnerTypeName, out innerType);

                // If the inner type fails, so does the parent.
                if (!success)
                    return false;

                // Success! Add the inner type to the parent.
                type.AddInnerType(innerType);
            }
        }

        // Return the parsed type information.
        typeInformation = type;
        return true;
    }

    public override string ToString()
    {
        // Create a string builder with the type name prefilled.
        var sb = new StringBuilder(this.TypeName);

        // If this type is generic (has inner types), append each recursively.
        if (this.IsGeneric)
        {
            sb.Append("<");

            // Get the number of inner types.
            int innerTypeCount = this.InnerTypes.Count();

            // Append each inner type's friendly string recursively.
            for (int i = 0; i < innerTypeCount; i++)
            {
                sb.Append(innerTypes[i].ToString());

                // Check if we need to add a comma to separate from the next inner type name.
                if (i + 1 < innerTypeCount)
                    sb.Append(", ");
            }

            sb.Append(">");
        }

        // If this type is an array, we append the array '[]' marker.
        if (this.IsArray)
            sb.Append("[]");

        return sb.ToString();
    }
}

I made a console app to test it, it seems to work with most cases I threw at it.

Here's the code:

class MainClass
{
    static readonly int RootIndentLevel = 2;
    static readonly string InputString = @"BogusClass<A,B,Vector<C>>";

    public static void Main(string[] args)
    {
        TypeInformation type = null;

        Console.WriteLine("Input  = {0}", InputString);

        var success = TypeInformation.TryParse(InputString, out type);

        if (success)
        {
            Console.WriteLine("Output = {0}", type.ToString());

            Console.WriteLine("Graph:");
            OutputGraph(type, RootIndentLevel);
        }
        else
            Console.WriteLine("Parsing error!");
    }

    static void OutputGraph(TypeInformation type, int indentLevel = 0)
    {
        Console.WriteLine("{0}{1}{2}", new string(' ', indentLevel), type.TypeName, type.IsArray ? "[]" : string.Empty);

        foreach (var innerType in type.InnerTypes)
            OutputGraph(innerType, indentLevel + 2);
    }
}

And here's the output:

Input  = BogusClass<A,B,Vector<C>>
Output = BogusClass<A, B, Vector<C>>
Graph:
  BogusClass
    A
    B
    Vector
      C

There are some possible lingering issues, such as multidimensional arrays. It will more than likely fail on something like int[,] or string[][].