If I have a delimited text file with a basic delimiter (say |
for instance) does it make a difference whether I use a String
or a Regex
split?
Would I see any performance gains with one versus the other?
I am assuming you would want to use Regex.Split
if you have escaped
delimiters that you don't want to split on (\|
for example).
Are there any other reasons to use Regex.Split
vs String.Split
?
Which one will work faster it is very subjective. Regex will work faster in execution, however Regex's compile time and setup time will be more in instance creation. But if you keep your regex object ready in the beginning, reusing same regex to do split will be faster.
String.Split does not need any setup time, but it is pure sequential search operation, it will work slower for big text.
By default I would reach for
String.Split
unless you have some complicated requirements that a regex would enable you to navigate around. Of course, as others have mentioned, profile it for your needs. Be sure to profile with and withoutRegexOptions.Compiled
too and understand how it works. Look at To Compile or Not To Compile, How does RegexOptions.Compiled work?, and search for other articles on the topic.One benefit of
String.Split
is itsStringSplitOptions.RemoveEmptyEntries
that removes empty results for cases where no data exists between delimiters. A regex pattern of the same split string/char would have excess empty entries. It's minor and can be handled by a simple LINQ query to filter outString.Empty
results.That said, a regex makes it extremely easy to include the delimiter if you have a need to do so. This is achieved by adding parentheses
()
around the pattern to make it a capturing group. For example:You might find this question helpful as well: How do I split a string by strings and include the delimiters using .NET?
The main reason for using
Regex.Split
is it's flexibility. UsingString.Split
you can only specify single delimiter character whenRegex.Split
provides all the power of Regexs to separate strings. In simplest casesString.Split
should be faster (because no overhead on building automata etc.)Regex.Split is more capable, but for an arrangement with basic delimitting (using a character that will not exist anywhere else in the string), the String.Split function is much easier to work with.
As far as performance goes, you would have to create a test and try it out. But, don't pre-optimize, unless you know that this function will be the bottleneck for some essential process.