Java has a convenient split method:
String str = "The quick brown fox";
String[] results = str.split(" ");
Is there an easy way to do this in C++?
Java has a convenient split method:
String str = "The quick brown fox";
String[] results = str.split(" ");
Is there an easy way to do this in C++?
This is a simple STL-only solution (~5 lines!) using
std::find
andstd::find_first_not_of
that handles repetitions of the delimiter (like spaces or periods for instance), as well leading and trailing delimiters:Try it out live!
If you're willing to use C, you can use the strtok function. You should pay attention to multi-threading issues when using it.
boost::tokenizer
is your friend, but consider making your code portable with reference to internationalization (i18n) issues by usingwstring
/wchar_t
instead of the legacystring
/char
types.Boost has a strong split function: boost::algorithm::split.
Sample program:
Output:
I thought that was what the
>>
operator on string streams was for:Adam Pierce's answer provides an hand-spun tokenizer taking in a
const char*
. It's a bit more problematic to do with iterators because incrementing astring
's end iterator is undefined. That said, givenstring str{ "The quick brown fox" }
we can certainly accomplish this:Live Example
If you're looking to abstract complexity by using standard functionality, as On Freund suggests
strtok
is a simple option:If you don't have access to C++17 you'll need to substitute
data(str)
as in this example: http://ideone.com/8kAGoaThough not demonstrated in the example,
strtok
need not use the same delimiter for each token. Along with this advantage though, there are several drawbacks:strtok
cannot be used on multiplestrings
at the same time: Either anullptr
must be passed to continue tokenizing the currentstring
or a newchar*
to tokenize must be passed (there are some non-standard implementations which do support this however, such as:strtok_s
)strtok
cannot be used on multiple threads simultaneously (this may however be implementation defined, for example: Visual Studio's implementation is thread safe)strtok
modifies thestring
it is operating on, so it cannot be used onconst string
s,const char*
s, or literal strings, to tokenize any of these withstrtok
or to operate on astring
who's contents need to be preserved,str
would have to be copied, then the copy could be operated onBoth the previous methods cannot generate a tokenized
vector
in-place, meaning without abstracting them into a helper function they cannot initializeconst vector<string> tokens
. That functionality and the ability to accept any white-space delimiter can be harnessed using anistream_iterator
. For example given:const string str{ "The quick \tbrown \nfox" }
we can do this:Live Example
The required construction of an
istringstream
for this option has far greater cost than the previous 2 options, however this cost is typically hidden in the expense ofstring
allocation.If none of the above options are flexable enough for your tokenization needs, the most flexible option is using a
regex_token_iterator
of course with this flexibility comes greater expense, but again this is likely hidden in thestring
allocation cost. Say for example we want to tokenize based on non-escaped commas, also eating white-space, given the following input:const string str{ "The ,qu\\,ick ,\tbrown, fox" }
we can do this:Live Example