I've been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters. The function is to be written to optimize performance when it is called thousands of times per second.
I know that .net has a function called
String.Join
, to which I may pass in the space character as a separator along with the original string.Barring the use of
String.Join
, I can use theStringBuilder
class to append spaces after each character.Another way to accomplish this task is to declare a character array with 2*n-1 characters (You have to add n-1 characters for the spaces). The character array can be filled in a loop and then passed to the String
constructor
.
I've written some .net code that runs each of these algorithms one millions times each with the parameter "Hello, World"
and measures how long it takes to execute. Method (3) is much, much faster than (1) or (2).
I know that (3) should be very fast because it avoids creating any additional string references to be garbage collected, but it seems to me that a built-in .net function such as String.Join
should yield good performance. Why is using String.Join
so much slower than doing the work by hand?
public static class TestClass
{
// 491 milliseconds for 1 million iterations
public static string Space1(string s)
{
return string.Join(" ", s.AsEnumerable());
}
//190 milliseconds for 1 million iterations
public static string Space2(string s)
{
if (s.Length < 2)
return s;
StringBuilder sb = new StringBuilder();
sb.Append(s[0]);
for (int i = 1; i < s.Length; i++)
{
sb.Append(' ');
sb.Append(s[i]);
}
return sb.ToString();
}
// 50 milliseconds for 1 million iterations
public static string Space3(string s)
{
if (s.Length < 2)
return s;
char[] array = new char[s.Length * 2 - 1];
array[0] = s[0];
for (int i = 1; i < s.Length; i++)
{
array[2*i-1] = ' ';
array[2*i] = s[i];
}
return new string(array);
}
Update: I have changed my project to "Release" mode and updated my elapsed times in the question accordingly.
The reason
String.Join
is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of yourIEnumerable<T>
.String.Join<T>(string, IEnumerable<T>)
(the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. In this case, it's trading flexibility for pure performance and speed.Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.
In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of
String.Join
. In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data. This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.
The bad performance is not coming from
String.Join
, but from the way you handle each character. In this case, since characters have to be handled individually, your first method will create much more intermediate strings and the second method suffers from two.Append
method calls for each character. Your third method does not involve a lots of intermediate strings or methods calls and that's the reason why your third method is the fastest.Your
String.Join
example works on anIEnumerable<char>
. Enumerating anIEnumerable<T>
withforeach
is often slower than executing afor
loop (it depends on the the collection type and other circumstances, as Dave Black pointed out in a comment). Even ifJoin
uses aStringBuilder
, the internal buffer of theStringBuilder
will have to be increased several times, since the number of items to append is not known in advance.Since you aren't using the Release build (which should have optimizations checked by default) and/or you're debugging through visual studio then the JITer will be prevented from making a lot of it's optimizations. Because of this you're just not getting a good picture of how long each operation really takes. Once you add in the optimizations you can get the real picture of what's going on.
It's also important that you not be debugging in visual studio. Go to the bin/release folder and double click the executable entirely outside of visual studio.
In your first method, you are using the overload of
String.Join
that operates on an Enumerable, which requires that the method walk the characters of the string using an enumerator. Internally, this uses aStringBuilder
as the exact number of characters is unknown.Have you considered using the
String.Join
overload that takes a string (or string array) instead? That implementation allows a fixed length buffer to be used (similar to your third method) along with some internal unsafe string operations for speed. The call would change to -String.Join(" ", s);
Without actually doing the legwork to measure, I would expect this to be on par or faster than your third approach.When you have passed an
IEnumerable
toString.Join
, it has no idea on how much memory needs to be allocated. I allocates a chunk of memory, resizes it if it is insufficient and repeats the process until it gets enough memory to accommodate all the strings.The array version is faster because we know the amount of memory allocated well ahead.
Also please not that when you are running the 1st version, GC might have occurred.