Trying to learn boost::intrusive Q2

2019-02-20 23:23发布

问题:

if I uncomment these

//BaseList   baselist; 
//MemberList memberlist;

outside the loop and comment out the ones inside the loop it crashes. I need to be able to have the baselist (and memberlist) outside any loop. How is this achieved?

Edit

The actual problem I am trying to solve in it's simplest form is this.

I want to have a std::vector of MyClass, call it AllThingsBunchedTogether. I also want to have a std::vector of BaseList, call it AllThingsSpreadOut.

So

  • AllThingsBunchedTogether might contain (just the anInt1 part for the sake of compactness): 1,2,1,10,2,3,4,4,5,9,10,10.
  • AllThingsSpreadOut might contain (zero not used for now) at [1] 1,1 at [2] 2,2 at [3] 3 at [4] 4,4 at [5] 5 at [9] 9 at [10] 10,10,10.

Note that the numbers themselves aren't be stored in the BaseList, but e.g., the MyClass(1, "John").

At [1] it could be "Mike", "John", at [2] it could be "Mike", "Dagobart" at [3] "John" ... at [10] "John" "Mike" "Dagobart" etc so that there no duplicates in any of the BaseList at AllThingsSpreadOut[i] since each MyClass in each BaseList hashes to a different value (anInt1 + Name).

In essence, anInt1 tells where the MyClass lives in AllThingsSpreadOut, but anInt1 + name guarantees uniqueness within each BaseList.

So the idea is that AllThingsSpreadOut is a vector of BaseList where at each BaseList at vector location is a list of similar things.

Then, when I remove things from AllThingsBunchedTogether (not by a clear, but by a search to remove some items like in the code below IsMarkedToDelete), they will automatically disappear from the corresponding AllThingsSpreadOut.

AllThingsSpreadOut acts as a sort for AllThingsBunchedTogether, with intrusive semantics. AllThingsBunchedTogether allows superfast access through [].

End Edit

#include <vector>
#include <iostream>

#include <boost/intrusive/list.hpp>

using namespace boost::intrusive;

class MyClass : public list_base_hook<link_mode<auto_unlink>> // This is a derivation hook
{
public:
    std::string name;
    bool bIsMarkedToDelete;
    int anInt1;
public:
    list_member_hook<link_mode<auto_unlink>> member_hook_; // This is a member hook

    MyClass(std::string n, int i) : name(n), anInt1(i), bIsMarkedToDelete(false) {}
};

bool IsMarkedToDelete(const MyClass &o)
{
    return o.bIsMarkedToDelete;
}

//Define a list that will store MyClass using the public base hook
typedef list<MyClass, constant_time_size<false>> BaseList;

// Define a list that will store MyClass using the public member hook
typedef list<MyClass,
        member_hook<MyClass, list_member_hook<link_mode<auto_unlink>>, &MyClass::member_hook_>,
        constant_time_size<false> > MemberList;

int main()
{
    bool done = false;
    std::vector<MyClass> values;

    std::string names[] = {"John", "Mike", "Dagobart"};

    //BaseList   baselist; 
    //MemberList memberlist;

    int i = 0;
    while(!done)
    {
        // Create several MyClass objects, each one with a different value

        for (int j = 0; j < 11; ++j)
            values.emplace_back(names[j % 3], j);

        BaseList   baselist;
        MemberList memberlist;

        // Now insert them in t-he reverse order in the base hook list
        for (auto& e : values)
        {
            baselist.push_front(e);
            memberlist.push_back(e);
        }

        // Now test lists
        auto rbit(baselist.rbegin());
        auto mit(memberlist.begin());
        auto it(values.begin()), itend(values.end());

        // Test the objects inserted in the base hook list
        for (; it != itend; ++it, ++rbit)
        {
            if (&*rbit != &*it)
                return 1;
        }
        // Test the objects inserted in the member hook list
        for (it = values.begin(); it != itend; ++it, ++mit)
        {
            if (&*mit != &*it)
                return 1;
        }
# if 0
        for(auto& e : values)
            std::cout << e.anInt1 << "\n";

        for(auto& e : baselist)
            std::cout << e.anInt1 << "\n";

        for(auto& e : memberlist)
            std::cout << e.anInt1 << "\n";

#endif // 0

        if(2 == i)
        {
            for(auto& e: values)
                std::cout << e.name << "\n";

            for(auto& e: values)
            {
                if("Mike" == e.name)
                    e.bIsMarkedToDelete = true;
            }

            values.erase(
                std::remove_if(values.begin(), values.end(), IsMarkedToDelete), values.end());
        }


        if(i++ > 3)
        {
            values.clear();
            done = true;
        }

        std::cout << "\n";
        std::cout << values.size()     << "\n";
        std::cout << baselist.size()   << "\n";
        std::cout << memberlist.size() << "\n";
    }
}

回答1:

I've seen it late, but anyways, here goes:

  1. What you describe matches exactly the implementation of an intrusive hash table of MyClass elements, where

    • anInt1 is the hash (the bucket identifier) for an element
    • the bucket lists are implemented as linked lists
    • equality is defined as equality of (anInt1, Name)


    So really, your program could just be:

    Live On Coliru

    std::unordered_set<MyClass> values {
        { "John",      0 }, { "Mike",      1 }, { "Dagobart",  2 },
        { "John",      3 }, { "Mike",      4 }, { "Dagobart",  5 },
        { "John",      6 }, { "Mike",      7 }, { "Dagobart",  8 },
        { "John",      9 }, { "Mike",     10 },
    };
    
    for(int i = 0; i<=3; ++i) {
        if(2 == i) {
            for(auto& e: values) std::cout << e.name << " "; std::cout << "\n";
            for(auto& e: values) e.bIsMarkedToDelete |= ("Mike" == e.name);
    
            for(auto it=begin(values); it!=end(values);) {
                if (it->bIsMarkedToDelete) it = values.erase(it);
                else ++it;
            }
        }
    
        std::cout << "i=" << i << ", values.size(): " << values.size() << "\n";
    }
    values.clear();
    std::cout << "Done\n";
    
  2. if you really wanted contiguous storage, I can only assume you wanted this for performance

    • you do not want to use pointers instead of objects, since that simply negates the memory layout ("AllThingsBunchedTogether") benefits and you'd be better of with the unordered_set or unodered_map as above

    • you do not want to use auto_unlink mode, since it cripples performance (by doing uncontrolled deletion triggers, by inhibiting constant-time size() and by creating thread safety issues)

    • instead, you should employ the above stratagy, but with boost::intrusive::unordered_set instead see http://www.boost.org/doc/libs/1_57_0/doc/html/intrusive/unordered_set_unordered_multiset.html

      Here, again, is a proof-of-concept:

      Live On Coliru

      #include <vector>
      #include <iostream>
      #include <boost/intrusive/unordered_set.hpp>
      #include <vector>
      //#include <functional>
      //#include <algorithm>
      
      namespace bic = boost::intrusive;
      
      struct MyClass : bic::unordered_set_base_hook<bic::link_mode<bic::auto_unlink>>
      {
          std::string name;
          int anInt1;
          mutable bool bIsMarkedToDelete;
      
          MyClass(std::string name, int i) : name(name), anInt1(i), bIsMarkedToDelete(false) {}
      
          bool operator==(MyClass const& o) const { return anInt1 == o.anInt1 && name == o.name; }
      
          struct hasher { size_t operator()(MyClass const& o) const { return o.anInt1; } };
      };
      
      typedef bic::unordered_set<MyClass, bic::hash<MyClass::hasher>, bic::constant_time_size<false> > HashTable;
      
      int main() {
      
          std::vector<MyClass> values {
              MyClass { "John", 0 }, MyClass { "Mike",  1 }, MyClass { "Dagobart", 2 },
              MyClass { "John", 3 }, MyClass { "Mike",  4 }, MyClass { "Dagobart", 5 },
              MyClass { "John", 6 }, MyClass { "Mike",  7 }, MyClass { "Dagobart", 8 },
              MyClass { "John", 9 }, MyClass { "Mike", 10 },
          }; 
      
          HashTable::bucket_type buckets[100];
          HashTable hashtable(values.begin(), values.end(), HashTable::bucket_traits(buckets, 100)); 
      
          for(int i = 0; i<=3; ++i) {
              if(2 == i) {
                  for(auto& e: values) std::cout << e.name << " "; std::cout << "\n";
                  for(auto& e: values) e.bIsMarkedToDelete |= ("Mike" == e.name);
      
                  values.erase(std::remove_if(begin(values), end(values), std::mem_fn(&MyClass::bIsMarkedToDelete)));
              }
      
              std::cout << "i=" << i << ", values.size():    " << values.size()    << "\n";
              std::cout << "i=" << i << ", hashtable.size(): " << hashtable.size() << "\n";
          }
          values.clear();
          std::cout << "Done\n";
      }
      


回答2:

Here's the error message, which you omitted:

Assertion `node_algorithms::inited(to_insert)' failed.

From this we can understand that an element is being inserted twice. This isn't valid with intrusive containers in general.

When you have your lists inside the loop, they are destroyed and recreated each time. But when they are outside, you never clear them, and you also never clear values, so this sequence occurs:

  1. Add 11 elements to values.
  2. Add all values to the lists.
  3. Add 11 elements to values; it still has the previous 11 so now 22 elements.
  4. Add all values to the lists. Crash on the first one, because it is already in a list.

One solution is to add values.clear() at the top of the while(!done) loop.