Is there any way I can make this C# code faster? [

2020-05-02 10:50发布

问题:

I am reading in a large file X12 and parsing the information within. I have two bottleneck functions that I can't seem to work around. read_line() and get_element() Is there any way I could make these two functions faster? The main bottleneck in the get_element function seems to be the Substring method.

    public String get_element(int element_number) {
        int count = 0;
        int start_index = 0;
        int end_index = 0;
        int current_index = 0;

        while (count < element_number && current_index != -1) {
            current_index = line_text.IndexOf(x12_reader.element_delimiter, start_index);
            start_index = current_index + 1;
            count++;
        }

        if (current_index != -1) {
            end_index = line_text.IndexOf(x12_reader.element_delimiter, start_index);
            if (end_index == -1) end_index = line_text.Length;
            return line_text.Substring(start_index, end_index - start_index);
        } else {
            return "";
        }
    }

    private String read_line() {
        string_builder.Clear();
        int n;
        while ((n = stream_reader.Read()) != -1) {
            if (n == line_terminator) return string_builder.ToString();
            string_builder.Append((char)n);
        }
        return string_builder.ToString();
    }

I am reading x12 data. Here is an example of what it looks like. http://examples.x12.org/005010X221/dollars-and-data-sent-together/

回答1:

Since your profiler tells you get_element is a bottleneck, and the method itself is coded very efficiently, you need to minimize the number of times this method is called.

Calling get_element repeatedly in a loop forces it to performs the same parsing job repeatedly:

for (int i = 0 ; i != n ; i++) {
    var element = get_element(i);
    ... // Do something with the element
}

You should be able to fix this problem by rewriting get_element as GetElements returning all elements as a collection, and then taking individual elements from the same collection in a loop:

var allElements = GetElements();
for (int i = 0 ; i != n ; i++) {
    var element = allElements[i];
    ... // Do something with the element
}

in most cases I only need one or two elements

In this case you could make a method that retrieves all required indexes at once - for example, by passing BitArray of required indexes.



回答2:

Ok, second try. Discarding String.Split due to performance reasons, something like this should work much faster than your implementation:

//DISCLAIMER; typed in my cell phone, not tested. Sure it has bugs but you should get the idea.
public string get_element(int index)
{
     var buffer = new StringBuilder();
     var counter = -1;

     using (var enumerator = text_line.GetEnumerator())
     {
         while (enumerator.MoveNext())
         {
             if (enumerator.Current == x12_reader.element_delimiter)
             {
                 counter++;
             }
             else if (counter == index)
             {
                 buffer.Append(enumerator.Current);
             }
             else if (counter > index)
                 break;
        }
     }

     return buffer.ToString();
}


回答3:

I'm not sure what you are doing exactly, but if I'm understanding your code correctly, wouldn't get element be simpler as follows?

public string get_Element(int index)
{
    var elements = line_text.Split(new[] { x12_reader.element_delimiter });

    if (index > elements.Length)
        return "";

    return elements[index];
}