C++11 std::set lambda comparison function

2019-03-08 09:14发布

问题:

I want to create a std::set with a custom comparison function. I could define it as a class with operator(), but I wanted to enjoy the ability to define a lambda where it is used, so I decided to define the lambda function in the initialization list of the constructor of the class which has the std::set as a member. But I can't get the type of the lambda. Before I proceed, here's an example:

class Foo
{
private:
     std::set<int, /*???*/> numbers;
public:
     Foo () : numbers ([](int x, int y)
                       {
                           return x < y;
                       })
     {
     }
};

I found two solutions after searching: one, using std::function. Just have the set comparison function type be std::function<bool (int, int)> and pass the lambda exactly like I did. The second solution is to write a make_set function, like std::make_pair.

SOLUTION 1:

class Foo
{
private:
     std::set<int, std::function<bool (int, int)> numbers;
public:
     Foo () : numbers ([](int x, int y)
                       {
                           return x < y;
                       })
     {
     }
};

SOLUTION 2:

template <class Key, class Compare>
std::set<Key, Compare> make_set (Compare compare)
{
     return std::set<Key, Compare> (compare);
}

The question is, do I have a good reason to prefer one solution over the other? I prefer the first one because it makes use of standard features (make_set is not a standard function), but I wonder: does using std::function make the code (potentially) slower? I mean, does it lower the chance the compiler inlines the comparison function, or it should be smart enough to behave exactly the same like it would it was a lambda function type and not std::function (I know, in this case it can't be a lambda type, but you know, I'm asking in general) ?

(I use GCC, but I'd like to know what popular compilers do in general)

SUMMARY, AFTER I GOT LOTS OF GREAT ANSWERS:

If speed is critical, the best solution is to use an class with operator() aka functor. It's easiest for the compiler to optimize and avoid any indirections.

For easy maintenance and a better general-purpose solution, using C++11 features, use std::function. It's still fast (just a little bit slower than the functor, but it may be negligible) and you can use any function - std::function, lambda, any callable object.

There's also an option to use a function pointer, but if there's no speed issue I think std::function is better (if you use C++11).

There's an option to define the lambda function somewhere else, but then you gain nothing from the comparison function being a lambda expression, since you could as well make it a class with operator() and the location of definition wouldn't be the set construction anyway.

There are more ideas, such as using delegation. If you want a more thorough explanation of all solutions, read the answers :)

回答1:

Yes, a std::function introduces nearly unavoidable indirection to your set. While the compiler can always, in theory, figure out that all use of your set's std::function involves calling it on a lambda that is always the exact same lambda, that is both hard and extremely fragile.

Fragile, because before the compiler can prove to itself that all calls to that std::function are actually calls to your lambda, it must prove that no access to your std::set ever sets the std::function to anything but your lambda. Which means it has to track down all possible routes to reach your std::set in all compilation units and prove none of them do it.

This might be possible in some cases, but relatively innocuous changes could break it even if your compiler managed to prove it.

On the other hand, a functor with a stateless operator() has easy to prove behavior, and optimizations involving that are everyday things.

So yes, in practice I'd suspect std::function could be slower. On the other hand, std::function solution is easier to maintain than the make_set one, and exchanging programmer time for program performance is pretty fungible.

make_set has the serious disadvantage that any such set's type must be inferred from the call to make_set. Often a set stores persistent state, and not something you create on the stack then let fall out of scope.

If you created a static or global stateless lambda auto MyComp = [](A const&, A const&)->bool { ... }, you can use the std::set<A, decltype(MyComp)> syntax to create a set that can persist, yet is easy for the compiler to optimize (because all instances of decltype(MyComp) are stateless functors) and inline. I point this out, because you are sticking the set in a struct. (Or does your compiler support

struct Foo {
  auto mySet = make_set<int>([](int l, int r){ return l<r; });
};

which I would find surprising!)

Finally, if you are worried about performance, consider that std::unordered_set is much faster (at the cost of being unable to iterate over the contents in order, and having to write/find a good hash), and that a sorted std::vector is better if you have a 2-phase "insert everything" then "query contents repeatedly". Simply stuff it into the vector first, then sort unique erase, then use the free equal_range algorithm.



回答2:

It's unlikely that the compiler will be able to inline a std::function call, whereas any compiler that supports lambdas would almost certainly inline the functor version, including if that functor is a lambda not hidden by a std::function.

You could use decltype to get the lambda's comparator type:

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>

int main()
{
   auto comp = [](int x, int y){ return x < y; };
   auto set  = std::set<int,decltype(comp)>( comp );

   set.insert(1);
   set.insert(10);
   set.insert(1); // Dupe!
   set.insert(2);

   std::copy( set.begin(), set.end(), std::ostream_iterator<int>(std::cout, "\n") );
}

Which prints:

1
2
10

See it run on Coliru.



回答3:

A stateless lambda (i.e. one with no captures) can decay to a function pointer, so your type could be:

std::set<int, bool (*)(int, int)> numbers;

Otherwise I'd go for the make_set solution. If you won't use a one-line creation function because it's non-standard you're not going to get much code written!



回答4:

From my experience playing around with the profiler, the best compromise between performance and beauty is to use a custom delegate implementation, such as:

https://codereview.stackexchange.com/questions/14730/impossibly-fast-delegate-in-c11

As the std::function is usually a bit too heavy. I can't comment on your specific circumstances, as I don't know them, though.



回答5:

If you're determined to have the set as a class member, initializing its comparator at constructor time, then at least one level of indirection is unavoidable. Consider that as far as the compiler knows, you could add another constructor:

 Foo () : numbers ([](int x, int y)
                   {
                       return x < y;
                   })
 {
 }

 Foo (char) : numbers ([](int x, int y)
                   {
                       return x > y;
                   })
 {
 }

Once the you have an object of type Foo, the type of the set doesn't carry information on which constructor initialized its comparator, so to call the correct lambda requires an indirection to the run-time selected lambda operator().

Since you're using captureless lambdas, you could use the function pointer type bool (*)(int, int) as your comparator type, as captureless lambdas have the appropriate conversion function. This would of course involve an indirection through the function pointer.



回答6:

The difference highly depends on your compiler's optimizations. If it optimizes lambda in a std::function those are equivalent, if not you introduce an indirection in the former that you won't have in the latter.