Splitting string using boost spirit

Is it a good idea? For a reason I thought it should be faster than boost's tokenizer or split. however most of the time I'm stuck in the boost::spirit::compile

template <typename Iterator>
struct ValueList : bsq::grammar<Iterator, std::vector<std::string>()>
{
    ValueList(const std::string& sep, bool isCaseSensitive) : ValueList::base_type(query)
    {
        if(isCaseSensitive)
        {
            query = value >> *(sep >> value);
            value = *(bsq::char_ - sep);
        }
        else
        {
            auto separator = bsq::no_case[sep];
            query = value >> *(separator >> value);
            value = *(bsq::char_ - separator);
        }
    }
    bsq::rule<Iterator, std::vector<std::string>()> query;
    bsq::rule<Iterator, std::string()> value;
};

inline bool Split(std::vector<std::string>& result, const std::string& buffer, const std::string& separator,
                  bool isCaseSensitive)
{
    result.clear();
    ValueList<std::string::const_iterator> parser(separator, isCaseSensitive);
    auto itBeg = buffer.begin();
    auto itEnd = buffer.end();
    if(!(bsq::parse(itBeg, itEnd, parser, result) && (itBeg == itEnd)))
        result.push_back(buffer);
    return true;
}

I've implemented it as shown above. What is wrong with my code? or just because the separator is defined in runtime the recompilation is inevitable?

8000000 splits in 1081ms.
8000000 splits in 1169ms.
8000000 splits in 2663ms.

first is for tokenizer, second is for boost::split and the third is for boost::spirit

First off, the different versions do not do the same thing:

in particular token compression behaves differently (I fixed it for boost::split but it doesn't appear to be a feature for boost::tokenizer)
separators were treated as string literals rather than charsets in the Spirit version (fixed that)

Yes, recompiles are inevitable with dynamic separators. But no, this is not the bottleneck (the other approaches have dynamic separators too):

I've done some optimizations. The timings:

Coliru clang++:

8000000 original (boost::tokenizer) rate: 2.84257μs
10000000 possible (boost::split) rate: 3.09941μs
10000000 spirit (dynamic version) rate: 1.45456μs
10000000 spirit (direct version, avoid type erasure) rate: 1.25588μs

next step:
10000000 spirit (precompiled sep) rate: 1.18059μs

Coliru g++

8000000 original (boost::tokenizer) rate: 2.92805μs
10000000 possible (boost::split) rate: 2.75442μs
10000000 spirit (dynamic version) rate: 1.32821μs
10000000 spirit (direct version, avoid type erasure) rate: 1.10712μs

next step:
10000000 spirit (precompiled sep) rate: 1.0791μs

Local system g++:

sehe@desktop:/tmp$ time ./test
8000000 original (boost::tokenizer) rate: 1.80061μs
10000000 possible (boost::split) rate: 1.29754μs
10000000 spirit (dynamic version) rate: 0.607789μs
10000000 spirit (direct version, avoid type erasure) rate: 0.488087μs

next step:
10000000 spirit (precompiled sep) rate: 0.498769μs

As you can see the Spirit approach doesn't need to be slower. What steps did I take? http://paste.ubuntu.com/11001344/

refactor benchmark to display rate (no change) 3.523μs
made case_insensitivity choice of caller (just use no_case[char_(delimiter)] if required) 2.742μs.
eliminate value subrule (reduced copying and dynamic dispatch because of type-erased non-terminal rule) 2.579μs.
Made delimiter charset instead of string literal: 2.693μs.

see intermediate version on coliru. (I recommend the code below, it's much more cleaned up)
Using qi::raw[] instead of std::string synthesized attributes (avoid copying!) 0.624072μs
Eliminating all non-terminals (i.e. type-erasure; see spirit_direct implementation) rate: 0.491011μs

Now it seems fairly obvious that all the implementations would benefit from not "compiling" the separator each time. I didn't do it for all the approaches, but for fun let's do it for the Spirit version:

Using hardcoded '|' delimiter 0.455269μs // fixed delimiter

Full listing:

Live On Coliru

#include <boost/algorithm/string.hpp>
#include <boost/tokenizer.hpp>
#include <boost/spirit/include/qi.hpp>

#include <vector>
#include <string>
#include <chrono>
#include <iostream>

void original(std::vector<std::string>& result, const std::string& input, const std::string& delimiter)
{
    result.clear();
    boost::char_separator<char> sep(delimiter.c_str());
    boost::tokenizer<boost::char_separator<char>, std::string::const_iterator, std::string> tok(input, sep);

    for (const auto& token : tok)
    {
        result.push_back(token);
    }
}

void possible(std::vector<std::string>& result, const std::string& input, const std::string& delimiter)
{
    result.clear();
    boost::split(result, input, boost::is_any_of(delimiter), boost::algorithm::token_compress_off);
}

namespace bsq = boost::spirit::qi;

void spirit_direct(std::vector<std::string>& result, const std::string& input, char const* delimiter)
{
    result.clear();
    using namespace bsq;
    if (!parse(input.begin(), input.end(), raw[*(char_ - char_(delimiter))] % char_(delimiter), result))
        result.push_back(input);
}

namespace detail {
    template <typename Sep> bsq::rule<std::string::const_iterator, std::vector<std::string>()>
        make_spirit_parser(Sep const& sep) 
        {
            using namespace bsq;
            return raw[*(char_ - sep)] % sep;
        }

    static const auto precompiled_pipes = make_spirit_parser('|');
}

void spirit(std::vector<std::string>& result, const std::string& input, char const* delimiter)
{
    result.clear();
    if (!bsq::parse(input.begin(), input.end(), detail::make_spirit_parser(bsq::char_(delimiter)), result))
        result.push_back(input);
}

void spirit_pipes(std::vector<std::string>& result, const std::string& input)
{
    result.clear();
    if (!bsq::parse(input.begin(), input.end(), detail::precompiled_pipes, result))
        result.push_back(input);
}

template <typename F> void bench(std::string const& caption, F approach) {
    size_t const iterations = 1000000;
    using namespace std::chrono;
    using C = high_resolution_clock;

    auto start = C::now();

    size_t count = 0;
    for (auto i = 0U; i < iterations; ++i) {
        count += approach();
    }

    auto us = duration_cast<std::chrono::microseconds>(C::now() - start).count();
    std::cout << count << " " << caption << " rate: " << (1.*us/iterations) << "μs\n";
}

int main()
{
    std::string const input = "2309|ioperwkl|2039i|rjjdsf|klsdjf230o|kx23904iep2|xp,39,4p2|xlmq2i3219||";
    auto separator = "|";

    std::vector<std::string> result;

    bench("original (boost::tokenizer)", [&] {
            original(result, input, separator);
            return result.size();
        });
    bench("possible (boost::split)", [&] {
            possible(result, input, separator);
            return result.size();
        });
    bench("spirit (dynamic version)", [&] {
            spirit(result, input, separator);
            return result.size();
        });
    bench("spirit (direct version, avoid type erasure)", [&] {
            spirit_direct(result, input, separator);
            return result.size();
        });

    std::cout << "\nnext step:\n";
    bench("spirit (precompiled sep)", [&] {
            spirit_pipes(result, input);
            return result.size();
        });
}