I have seen this question and mine is very similar to it, but it is different, so please do not mark it as duplicate.
My question is: How do I get the empty fields from a string?
I have a string like std::string s = "This.is..a.test";
and I want to get the fields <This> <is> <> <a> <test>
.
I have tried also
typedef boost::char_separator<char> ChSep;
typedef boost::tokenizer<ChSep> TknChSep;
ChSep sep(".", ".", boost::keep_empty_tokens);
TknChSep tok(s, sep);
for (TknChSep::iterator beg = tok.begin(); beg != tok.end(); ++beg)
{
std::cout << "<" << *beg << "> ";
}
but I get <This> <.> <is> <.> <> <.> <a> <test>
.
The second argument to Boost.Tokenizer's char_separator
is the kept_delims
parameter. It is used to specify a delimiters that will show up as tokens. The original code is specifying that "."
should be kept as a token. To resolve this, change:
ChSep sep(".", ".", boost::keep_empty_tokens);
to:
ChSep sep(".", "", boost::keep_empty_tokens);
// ^-- no delimiters will show up as tokens.
Here is a complete example:
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>
int main()
{
std::string str = "This.is..a.test";
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
boost::char_separator<char> sep(
".", // dropped delimiters
"", // kept delimiters
boost::keep_empty_tokens); // empty token policy
BOOST_FOREACH(std::string token, tokenizer(str, sep))
{
std::cout << "<" << token << "> ";
}
std::cout << std::endl;
}
Which produces the desired output:
<This> <is> <> <a> <test>
I think I'd skip Boost::tokenizer
, and just use a standard regex to do the splitting:
#include <iterator>
#include <regex>
#include <string>
#include <iostream>
int main() {
std::string s = "This.is..a.test";
std::regex sep{ "\\." };
std::copy(std::sregex_token_iterator(s.begin(), s.end(), sep, -1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Result:
This
is
a
test