Searching for Holy Grail of search and replace in

2020-07-22 17:22发布

问题:

Recently I was looking for a way to replace tokens in string which is essentially find and replace (but there is at least one additional approach to the problem) and looks like quite banal task. I've came with couple of possible implementations but none of them was satisfying from performance point of view. Best achievement was ~50us per iteration. The case was ideal, the string was never growing in size and initially I've omitted the requirement to be case insensitive
Here is the code at Coliru

Results from my machine:
Boost.Spirit symbols result: 3421?=3421
100000 cycles took 6060ms.
Boyer-Moore results:3421?=3421
100000 cycles took 5959ms.
Boyer Moore Hospool result:3421?=3421
100000 cycles took 5008ms.
Knuth Morris Pratt result: 3421?=3421
100000 cycles took 12451ms.
Naive STL search and replace result: 3421?=3421
100000 cycles took 5532ms.
Boost replace_all result:3421?=3421
100000 cycles took 4860ms.

So the question, what takes so long in such a simple task? One can say, ok, simple task, go ahead and implement it better. But the reality is that 15 years old MFC naive implementation does the task order of magnitude faster:

CString FillTokenParams(const CString& input, const std::unordered_map<std::string, std::string>& tokens)
{
    CString tmpInput = input;
    for(const auto& token : tokens)
    {
        int pos = 0;
        while(pos != -1)
        {
            pos = tmpInput.Find(token.first.c_str(), pos);
            if(pos != -1)
            {
                int tokenLength = token.first.size();
                tmpInput.Delete(pos, tokenLength);
                tmpInput.Insert(pos, token.second.c_str());
                pos += 1;
            }
        }
    }

    return tmpInput;
}

result:
MFC naive search and replace result:3421?=3421
100000 cycles took 516ms.
How come this clumsy code outperforms modern C++??? why other implementations were so slow? Am I missing something fundamental?

EDIT001: I've invested in this question, the code been profiled and triple checked. You can be unsatisfied with this and that, but std::string::replace is not what taking time. In any STL implementation search is what taking most of the time, boost spirit wastes time on allocation of tst (test node in evaluation tree I guess). I dont expect anyone pointing on a line in a function with "this is your problem" and poof, the problem is gone. The question is how did MFC manages to do the same work 10 times faster.

EDIT002: Just drilled down into MFC implementation of Find and wrote a function which mimics the MFC implementation

namespace mfc
{
std::string::size_type Find(const std::string& input, const std::string& subString, std::string::size_type start)
{
    if(subString.empty())
    {
        return std::string::npos;
    }

    if(start < 0 || start > input.size())
    {
        return std::string::npos;
    }

    auto found = strstr(input.c_str() + start, subString.c_str());
    return ((found == nullptr) ? std::string::npos : std::string::size_type(found - input.c_str()));
}
}

std::string MFCMimicking(const std::string& input, const std::unordered_map<std::string, std::string>& tokens)
{
    auto tmpInput = input;
    for(const auto& token : tokens)
    {
        auto pos = 0;
        while(pos != std::string::npos)
        {
            pos = mfc::Find(tmpInput, token.first, pos);
            if(pos != std::string::npos)
            {
                auto tokenLength = token.first.size();
                tmpInput.replace(pos, tokenLength, token.second.c_str());
                pos += 1;
            }
        }
    }

    return tmpInput;
}

Results:
MFC mimicking expand result:3421?=3421
100000 cycles took 411ms.
Meaning 4us. per call, go beat that C strstr

EDIT003: Compiling and running with -Ox


MFC mimicking expand result:3421?=3421
100000 cycles took 660ms.
MFC naive search and replace result:3421?=3421
100000 cycles took 856ms.
Manual expand result:3421?=3421
100000 cycles took 1995ms.
Boyer-Moore results:3421?=3421
100000 cycles took 6911ms.
Boyer Moore Hospool result:3421?=3421
100000 cycles took 5670ms.
Knuth Morris Pratt result: 3421?=3421
100000 cycles took 13825ms.
Naive STL search and replace result: 3421?=3421
100000 cycles took 9531ms.
Boost replace_all result:3421?=3421
100000 cycles took 8996ms.


running with -O2 (as in original measurements) but 10k cycles


MFC mimicking expand result:3421?=3421
10000 cycles took 104ms.
MFC naive search and replace result:3421?=3421
10000 cycles took 105ms.
Manual expand result:3421?=3421
10000 cycles took 356ms.
Boyer-Moore results:3421?=3421
10000 cycles took 1355ms.
Boyer Moore Hospool result:3421?=3421
10000 cycles took 1101ms.
Knuth Morris Pratt result: 3421?=3421
10000 cycles took 1973ms.
Naive STL search and replace result: 3421?=3421
10000 cycles took 923ms.
Boost replace_all result:3421?=3421
10000 cycles took 880ms.

回答1:

So, I have some observations about the Qi version.

Also created an X3 version.

Finally, wrote a manual expansion function that beats all the other candidates (I expect it to be faster than MFC, because it doesn't bother with repeated deletes/inserts).

Skip to the benchmark charts if you want.

About the Qi version

  1. yes, symbol tables suffer from locality issues of node-based containers. They might not be the best match you can use here.
  2. There's no need to rebuild the symbols each loop:
  3. Instead of character-wise skipping of non-symbols, scan until the next:

    +(bsq::char_ - symbols)
    
inline std::string spirit_qi(const std::string& input, bsq::symbols<char, std::string> const& symbols)
{
    std::string retVal;
    retVal.reserve(input.size() * 2);

    auto beg = input.cbegin();
    auto end = input.cend();

    if(!bsq::parse(beg, end, *(symbols | +(bsq::char_ - symbols)), retVal))
        retVal = input;

    return retVal;
}

This is considerably faster already. But:

Manual Loops

In this trivial example, why don't you just do the parsing manually?

inline std::string manual_expand(const std::string& input, TokenMap const& tokens)
{
    std::ostringstream builder;
    auto expand = [&](auto const& key) {
        auto match = tokens.find(key);
        if (match == tokens.end())
            builder << "$(" << key << ")";
        else
            builder << match->second;
    };

    builder.str().reserve(input.size()*2);

    builder.str("");
    std::ostreambuf_iterator<char> out(builder);

    for(auto f(input.begin()), l(input.end()); f != l;) {
        switch(*f) {
            case '$' : {
                    if (++f==l || *f!='(') {
                        *out++ = '$';
                        break;
                    }
                    else {
                        auto s = ++f;
                        size_t n = 0;

                        while (f!=l && *f != ')')
                            ++f, ++n;

                        // key is [s,f] now
                        expand(std::string(&*s, &*s+n));

                        if (f!=l)
                            ++f; // skip '}'
                    }
                }
            default:
                *out++ = *f++;
        }
    }
    return builder.str();
}

This is vastly superior in performance on my machine.

Other ideas

You could look at Boost Spirit Lex, potentially with statically generated token tables: http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/lex/abstracts/lexer_static_model.html. I'm not particularly fond of Lex though.

COMPARISONS:

See the Interactive Chart

That's using Nonius for the benchmarking statistics.

Full Benchmark code: http://paste.ubuntu.com/14133072/

#include <boost/container/flat_map.hpp>

#define USE_X3
#ifdef USE_X3
#   include <boost/spirit/home/x3.hpp>
#else
#   include <boost/spirit/include/qi.hpp>
#endif

#include <boost/algorithm/string.hpp>
#include <boost/algorithm/searching/boyer_moore.hpp>
#include <boost/algorithm/searching/boyer_moore_horspool.hpp>
#include <boost/algorithm/searching/knuth_morris_pratt.hpp>
#include <string>
#include <unordered_map>
#include <iostream>
#include <fstream>
#include <nonius/benchmark.h++>
#include <nonius/main.h++>

using TokenMap = boost::container::flat_map<std::string, std::string>;

#ifdef USE_X3
    namespace x3  = boost::spirit::x3;

    struct append {
        std::string& out;
        void do_append(char const ch) const                       { out += ch;                      } 
        void do_append(std::string const& s)  const               { out += s;                       } 
        template<typename It>
        void do_append(boost::iterator_range<It> const& r)  const { out.append(r.begin(), r.end()); } 
        template<typename Ctx>
        void operator()(Ctx& ctx) const                           { do_append(_attr(ctx));          } 
    };

    inline std::string spirit_x3(const std::string& input, x3::symbols<char const*> const& symbols)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);
        append appender { retVal };

        auto beg = input.cbegin();
        auto end = input.cend();

        auto rule = *(symbols[appender] | x3::char_ [appender]);

        if(!x3::parse(beg, end, rule))
            retVal = input;

        return retVal;
    }
#else
    namespace bsq = boost::spirit::qi;

    inline std::string spirit_qi_old(const std::string& input, TokenMap const& tokens)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);
        bsq::symbols<char const, char const*> symbols;
        for(const auto& token : tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }

        auto beg = input.cbegin();
        auto end = input.cend();

        if(!bsq::parse(beg, end, *(symbols | bsq::char_), retVal))
            retVal = input;

        return retVal;
    }

    inline std::string spirit_qi(const std::string& input, bsq::symbols<char, std::string> const& symbols)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);

        auto beg = input.cbegin();
        auto end = input.cend();

        if(!bsq::parse(beg, end, *(symbols | +(bsq::char_ - symbols)), retVal))
            retVal = input;

        return retVal;
    }
#endif

inline std::string manual_expand(const std::string& input, TokenMap const& tokens) {
    std::ostringstream builder;
    auto expand = [&](auto const& key) {
        auto match = tokens.find(key);

        if (match == tokens.end())
            builder << "$(" << key << ")";
        else
            builder << match->second;
    };

    builder.str().reserve(input.size()*2);
    std::ostreambuf_iterator<char> out(builder);

    for(auto f(input.begin()), l(input.end()); f != l;) {
        switch(*f) {
            case '$' : {
                    if (++f==l || *f!='(') {
                        *out++ = '$';
                        break;
                    }
                    else {
                        auto s = ++f;
                        size_t n = 0;

                        while (f!=l && *f != ')')
                            ++f, ++n;

                        // key is [s,f] now
                        expand(std::string(&*s, &*s+n));

                        if (f!=l)
                            ++f; // skip '}'
                    }
                }
            default:
                *out++ = *f++;
        }
    }
    return builder.str();
}

inline std::string boost_replace_all(const std::string& input, TokenMap const& tokens)
{
    std::string retVal(input);
    retVal.reserve(input.size() * 2);

    for(const auto& token : tokens)
    {
        boost::replace_all(retVal, token.first, token.second);
    }
    return retVal;
}

inline void naive_stl(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = std::search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = std::search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        }
    }
}

inline void boyer_more(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next =
            boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

inline void bmh_search(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = boost::algorithm::boyer_moore_horspool_search(input.cbegin(), input.cend(), token.first.begin(),
                                                                  token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

inline void kmp_search(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = boost::algorithm::knuth_morris_pratt_search(input.cbegin(), input.cend(), token.first.begin(),
                                                                token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

namespace testdata {
    std::string const expected =
        "Five and Seven said nothing, but looked at Two. Two began in a low voice, 'Why the fact is, you see, Miss, "
        "this here ought to have been a red rose-tree, and we put a white one in by mistake; and if the Queen was to "
        "find it out, we should all have our heads cut off, you know. So you see, Miss, we're doing our best, afore "
        "she comes, to—' At this moment Five, who had been anxiously looking across the garden, called out 'The Queen! "
        "The Queen!' and the three gardeners instantly threw themselves flat upon their faces. There was a sound of "
        "many footsteps, and Alice looked round, eager to see the Queen.First came ten soldiers carrying clubs; these "
        "were all shaped like the three gardeners, oblong and flat, with their hands and feet at the corners: next the "
        "ten courtiers; these were ornamented all over with diamonds, and walked two and two, as the soldiers did. "
        "After these came the royal children; there were ten of them, and the little dears came jumping merrily along "
        "hand in hand, in couples: they were all ornamented with hearts. Next came the guests, mostly Kings and "
        "Queens, and among them Alice recognised the White Rabbit: it was talking in a hurried nervous manner, smiling "
        "at everything that was said, and went by without noticing her. Then followed the Knave of Hearts, carrying "
        "the King's crown on a crimson velvet cushion; and, last of all this grand procession, came THE KING AND QUEEN "
        "OF HEARTS.Alice was rather doubtful whether she ought not to lie down on her face like the three gardeners, "
        "but she could not remember ever having heard of such a rule at processions; 'and besides, what would be the "
        "use of a procession,' thought she, 'if people had all to lie down upon their faces, so that they couldn't see "
        "it?' So she stood still where she was, and waited.When the procession came opposite to Alice, they all "
        "stopped and looked at her, and the Queen said severely 'Who is this?' She said it to the Knave of Hearts, who "
        "only bowed and smiled in reply.'Idiot!' said the Queen, tossing her head impatiently; and, turning to Alice, "
        "she went on, 'What's your name, child?''My name is Alice, so please your Majesty,' said Alice very politely; "
        "but she added, to herself, 'Why, they're only a pack of cards, after all. I needn't be afraid of them!''And "
        "who are these?' said the Queen, pointing to the three gardeners who were lying round the rosetree; for, you "
        "see, as they were lying on their faces, and the pattern on their backs was the same as the rest of the pack, "
        "she could not tell whether they were gardeners, or soldiers, or courtiers, or three of her own children.'How "
        "should I know?' said Alice, surprised at her own courage. 'It's no business of mine.'The Queen turned crimson "
        "with fury, and, after glaring at her for a moment like a wild beast, screamed 'Off with her head! "
        "Off—''Nonsense!' said Alice, very loudly and decidedly, and the Queen was silent.The King laid his hand upon "
        "her arm, and timidly said 'Consider, my dear: she is only a child!'The Queen turned angrily away from him, "
        "and said to the Knave 'Turn them over!'The Knave did so, very carefully, with one foot.'Get up!' said the "
        "Queen, in a shrill, loud voice, and the three gardeners instantly jumped up, and began bowing to the King, "
        "the Queen, the royal children, and everybody else.'Leave off that!' screamed the Queen. 'You make me giddy.' "
        "And then, turning to the rose-tree, she went on, 'What have you been doing here?'";
    std::string const inputWithtokens =
        "Five and Seven said nothing, but looked at $(Two). $(Two) began in a low voice, 'Why the fact is, you see, "
        "Miss, "
        "this here ought to have been a red rose-tree, and we put a white one in by mistake; and if the Queen was to "
        "find it out, we should all have our $(heads) cut off, you know. So you see, Miss, we're doing our best, afore "
        "she comes, to—' At this moment Five, who had been anxiously looking across the garden, called out 'The Queen! "
        "The Queen!' and the three gardeners instantly threw themselves flat upon their faces. There was a sound of "
        "many footsteps, and Alice looked round, eager to see the $(Queen).First came ten soldiers carrying clubs; "
        "these "
        "were all shaped like the three gardeners, oblong and flat, with their hands and feet at the corners: next the "
        "ten courtiers; these were ornamented all over with $(diamonds), and walked two and two, as the soldiers did. "
        "After these came the royal children; there were ten of them, and the little dears came jumping merrily along "
        "hand in hand, in couples: they were all ornamented with hearts. Next came the guests, mostly Kings and "
        "Queens, and among them Alice recognised the White Rabbit: it was talking in a hurried nervous manner, smiling "
        "at everything that was said, and went by without noticing her. Then followed the Knave of Hearts, carrying "
        "the King's crown on a crimson velvet cushion; and, last of all this grand procession, came THE KING AND QUEEN "
        "OF HEARTS.Alice was rather doubtful whether she ought not to lie down on her face like the three gardeners, "
        "but she could not remember ever having heard of such a rule at processions; 'and besides, what would be the "
        "use of a procession,' thought she, 'if people had all to lie down upon their faces, so that they couldn't see "
        "it?' So she stood still where she was, and waited.When the procession came opposite to Alice, they all "
        "stopped and looked at her, and the $(Queen) said severely 'Who is this?' She said it to the Knave of Hearts, "
        "who "
        "only bowed and smiled in reply.'Idiot!' said the Queen, tossing her head impatiently; and, turning to Alice, "
        "she went on, 'What's your name, child?''My name is Alice, so please your Majesty,' said Alice very politely; "
        "but she added, to herself, 'Why, they're only a pack of cards, after all. I needn't be afraid of them!''And "
        "who are these?' said the $(Queen), pointing to the three gardeners who were lying round the rosetree; for, "
        "you "
        "see, as they were lying on their faces, and the $(pattern) on their backs was the same as the rest of the "
        "pack, "
        "she could not tell whether they were gardeners, or soldiers, or courtiers, or three of her own children.'How "
        "should I know?' said Alice, surprised at her own courage. 'It's no business of mine.'The Queen turned crimson "
        "with fury, and, after glaring at her for a moment like a wild beast, screamed 'Off with her head! "
        "Off—''Nonsense!' said $(Alice), very loudly and decidedly, and the Queen was silent.The $(King) laid his hand "
        "upon "
        "her arm, and timidly said 'Consider, my dear: she is only a child!'The $(Queen) turned angrily away from him, "
        "and said to the $(Knave) 'Turn them over!'The $(Knave) did so, very carefully, with one foot.'Get up!' said "
        "the "
        "Queen, in a shrill, loud voice, and the three gardeners instantly jumped up, and began bowing to the King, "
        "the Queen, the royal children, and everybody else.'Leave off that!' screamed the Queen. 'You make me giddy.' "
        "And then, turning to the rose-tree, she went on, 'What have you been doing here?'";

    static TokenMap const raw_tokens {
        {"Two", "Two"},           {"heads", "heads"},
        {"diamonds", "diamonds"}, {"Queen", "Queen"},
        {"pattern", "pattern"},   {"Alice", "Alice"},
        {"King", "King"},         {"Knave", "Knave"},
        {"Why", "Why"},           {"glaring", "glaring"},
        {"name", "name"},         {"know", "know"},
        {"Idiot", "Idiot"},       {"children", "children"},
        {"Nonsense", "Nonsense"}, {"procession", "procession"},
    };

    static TokenMap const tokens {
        {"$(Two)", "Two"},           {"$(heads)", "heads"},
        {"$(diamonds)", "diamonds"}, {"$(Queen)", "Queen"},
        {"$(pattern)", "pattern"},   {"$(Alice)", "Alice"},
        {"$(King)", "King"},         {"$(Knave)", "Knave"},
        {"$(Why)", "Why"},           {"$(glaring)", "glaring"},
        {"$(name)", "name"},         {"$(know)", "know"},
        {"$(Idiot)", "Idiot"},       {"$(children)", "children"},
        {"$(Nonsense)", "Nonsense"}, {"$(procession)", "procession"},
    };

}

NONIUS_BENCHMARK("manual_expand", [](nonius::chronometer cm)     {
    std::string const tmp = testdata::inputWithtokens;
    auto& tokens = testdata::raw_tokens;

    std::string result;
    cm.measure([&](int) {
        result = manual_expand(tmp, tokens);
    });
    assert(result == testdata::expected);
})

#ifdef USE_X3
NONIUS_BENCHMARK("spirit_x3", [](nonius::chronometer cm) {
    auto const symbols = [&] {
        x3::symbols<char const*> symbols;
        for(const auto& token : testdata::tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }
        return symbols;
    }();

    std::string result;
    cm.measure([&](int) {
            result = spirit_x3(testdata::inputWithtokens, symbols);
        });
    //std::cout << "====\n" << result << "\n====\n";
    assert(testdata::expected == result);
})
#else
NONIUS_BENCHMARK("spirit_qi", [](nonius::chronometer cm) {
    auto const symbols = [&] {
        bsq::symbols<char, std::string> symbols;
        for(const auto& token : testdata::tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }
        return symbols;
    }();

    std::string result;
    cm.measure([&](int) {
            result = spirit_qi(testdata::inputWithtokens, symbols);
        });
    assert(testdata::expected == result);
})

NONIUS_BENCHMARK("spirit_qi_old", [](nonius::chronometer cm) {
    std::string result;
    cm.measure([&](int) {
            result = spirit_qi_old(testdata::inputWithtokens, testdata::tokens);
        });
    assert(testdata::expected == result);
})
#endif

NONIUS_BENCHMARK("boyer_more", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        boyer_more(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("bmh_search", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        bmh_search(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("kmp_search", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        kmp_search(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("naive_stl", [](nonius::chronometer cm) {
    cm.measure([&](int) {
            std::string tmp = testdata::inputWithtokens;
            naive_stl(tmp, testdata::tokens);
            assert(tmp == testdata::expected);
        });
})

NONIUS_BENCHMARK("boost_replace_all", [](nonius::chronometer cm)     {
    std::string const tmp = testdata::inputWithtokens;

    std::string result;
    cm.measure([&](int) {
        result = boost_replace_all(testdata::inputWithtokens, testdata::tokens);
    });
    assert(result == testdata::expected);
})


回答2:

EDIT2 for MFCMimicking: Well from your code it's obvious why the MFC version is faster: It doesn't search the entire string each replacement like some of your other versions do (I still can't explain boost::spirit). As soon as it does a replace it starts searching from the replacement point, not from the beginning of the string, so it's completely obvious that will be faster.

EDIT: After doing some more research and seeing (Algorithm to find multiple string matches) it seems that using good single-string-match algorithms for finding multiple search terms is the actual problem here. Probably your best bet is to use an appropriate algorithm (a few are mentioned in that question).

As for why MFC is faster? I would suggest distilling that down to a different question "Why are delete and insert in CString so much faster than std::string" or something like that, and make sure you tag it C++ and MFC so people with the right expertise can help (I have experience with standard C++ but can't help on what optimizations VC++ put into CString).

Original answer: OK due to the huge volume of code I only looked at expandTokens3 but I assume all versions have the same problem. Your code has two potentially significant performance issues:

  • You search the entire string each time you do a replacement. If you have ten variables to replace in a string it will take on the order of ten times longer than needed.

  • You execute each replacement in-place in the input string rather than building up the result string from each piece. This can result in memory allocation and copy for each replacement you do, again possibly significantly increasing runtime.



回答3:

So the question, what takes so long in such a simple task? One can say, ok, simple task, go ahead and implement it better. But the reality is that 15 years old MFC naive implementation does the task order of magnitude faster

The answer is actually simple.

First i compiled up your code on my macbook pro using apple clang 7.0:

$ cc --version
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.2.0
Thread model: posix

The results seem to match the OP's...

Boost.Spirit symbols result: 3425?=3425
10000 cycles took 8906ms.
Boyer-Moore results:3425?=3425
10000 cycles took 2891ms.
Boyer Moore Hospool result:3425?=3425
10000 cycles took 2392ms.
Knuth Morris Pratt result: 3425?=3425
10000 cycles took 4363ms.
Naive STL search and replace result: 3425?=3425
10000 cycles took 4333ms.
Boost replace_all result:3425?=3425
10000 cycles took 23284ms.
MFCMimicking result:3425?=3425
10000 cycles took 426ms.    <-- seemingly outstanding, no?

Then I added the -O3 flag:

Boost.Spirit symbols result: 3425?=3425
10000 cycles took 675ms.
Boyer-Moore results:3425?=3425
10000 cycles took 788ms.
Boyer Moore Hospool result:3425?=3425
10000 cycles took 623ms.
Knuth Morris Pratt result: 3425?=3425
10000 cycles took 1623ms.

Naive STL search and replace result: 3425?=3425
10000 cycles took 562ms.                    <-- pretty good!!!

Boost replace_all result:3425?=3425
10000 cycles took 748ms.
MFCMimicking result:3425?=3425
10000 cycles took 431ms.                    <-- awesome but not as outstanding as it was!

And now the results are in the same order of magnitude as the MFC CString results.

Why?

Because when you compile against BOOST and/or STL, you are expanding templates and the library code takes on the same optimisation settings as your compilation unit.

When you link against MFC, you're linking against a shared library that was compiled with optimisations turned on.

When you use strstr you're calling into the c library which is precompiled, optimised and in some parts, hand-written. Of course it's going to be fast!

solved :)

10000 cycles not 100000, different machine...

for reference, here are the results for the 100,000 cycles version running on the laptop on battery power. Full optimisation (-O3):

Boost.Spirit symbols result: 3425?=3425
100000 cycles took 6712ms.
Boyer-Moore results:3425?=3425
100000 cycles took 7923ms.
Boyer Moore Hospool result:3425?=3425
100000 cycles took 6091ms.
Knuth Morris Pratt result: 3425?=3425
100000 cycles took 16330ms.

Naive STL search and replace result: 3425?=3425
100000 cycles took 6719ms.

Boost replace_all result:3425?=3425
100000 cycles took 7353ms.

MFCMimicking result:3425?=3425
100000 cycles took 4076ms.


回答4:

Just some update. I ran the original STL code (with search) vs MFC-inspired one and I got that with optimizations (-O2) stl-base gives 228ms while MFC-like gives 285ms. Without optimizations I get something like 7284ms vs 310ms. I do it on macbook2016Pro with i7-6700HQ CPU @ 2.60GHz. So basically the code which uses strstr could not be optimized while STL code was heavily optimized.

Then I ran the final version of naiveSTL code which uses find instead of search and it gave me 28ms. So definitely it is a winner. I added the code below in case if the link by @kreuzerkrieg will be invalid one day.

inline void naiveSTL(std::string& input, const TokenMap& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = input.find(token.first);
        while(next != std::string::npos)
        {
            input.replace(next, token.first.size(), token.second);
            next = input.find(token.first, next + 1);
        }
    }
}



回答5:

Ok, it will be a long story. Just to remind you questions asked.

  1. Why the search and replace using C++ (various approaches) so slow?
  2. Why the MFC search and replace so fast?

Surprisingly, both questions have the same answer. Because of C++ overhead. Yep. Our shiny modern C++ has an overhead which mostly we dismiss for the flexibility and elegance.

However, when it comes to sub-microsecond resolutions (not that C++ is not capable of doing things in nanosecond resolutions) the overhead becomes more prominent.

Let me showcase with the same code I've posted in the question, but it is more coherent with things done in each function.

Live On Coliru

It uses aforementioned Nonius (thanks to @sehe) and interactive results are here.

You can click the legend to show/hide particular series.

Conclusions

There are two outstanding results

  • MFC mimicking function and
  • my own manual replace

These functions at least one order of magnitude are faster than the rest, so what is the difference?

All these "slow" functions are written in C++ when the fast is written in C (not pure C, I was too lazy to deal with malloc/realloc of output buffers when the output grows in size). Well, I guess it is clear that sometimes there is no choice but resort to pure C. I personally against using C for security reasons and lack of type safety. In addition it simply requires more expertise and attention to write high quality C code.

I will not mark it as an answer for a while, waiting for comments on this conclusion.

I want to thank all those who actively participated in the discussion, raised ideas and pointed out inconsistencies in my example.

2019 update:
Just to preserve the code: https://github.com/kreuzerkrieg/string_search_replace
Nonius results: https://kreuzerkrieg.github.io/string_search_replace/

Ran with gcc-9 on Ubuntu 18.04



回答6:

Your questionable use of std::string:replace is so pointlessly slow that nothing else in the code matters.