Fast implementation of operations on large sets of

2019-04-01 20:14发布

站内文章 / C++

28 0

别忘想泡老子

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Description :

I implemented the following class LabSetInt64 (see below code).

The goal here is to manipulate large sets of big integers (up to values of 10M) as fast as possible. My main requirements are focused on :

!Crucial : Getting the size/cardinality of a set as fast as possible
!Important : Being able to iterate very fast through a set

So, starting from the implementation below, it still remain two points I would like to discuss with you.

The "popcount()" lazy implementation

int LabSetInt64::popcount(uint64_t x) {
    int count;
    for (count=0; x; count++)
        x &= x-1;
    return count;
}

I know that the implementation I chose is one of the slowest in the market place, but I could not find out a better one by myself. So if you can point me to a better one (64 bits, of course)...

I heard about a very efficient algorithm to compute cardinality : The "MIT HAKMEM 169" algorithm >>

// MIT HAKMEM.
// MIT HAKMEM Count is funky. Consider a 3 bit number as being 4a+2b+c. If we
// shift it right 1 bit, we have 2a+b. Subtracting this from the original gives
// 2a+b+c. If we right-shift the original 3-bit number by two bits, we get a,
// and so with another subtraction we have a+b+c, which is the number of bits
// in the original number. How is this insight employed? The first assignment
// statement in the routine computes tmp. Consider the octal representation of
// tmp. Each digit in the octal representation is simply the number of 1¡äs in
// the corresponding three bit positions in n. The last return statement sums
// these octal digits to produce the final answer. The key idea is to add
// adjacent pairs of octal digits together and then compute the remainder
// modulus 63. This is accomplished by right-shifting tmp by three bits, adding
// it to tmp itself and ANDing with a suitable mask. This yields a number in
// which groups of six adjacent bits (starting from the LSB) contain the number
// of 1¡äs among those six positions in n. This number modulo 63 yields the
// final answer. For 64-bit numbers, we would have to add triples of octal
// digits and use modulus 1023. This is HACKMEM 169, as used in X11 sources.
// Source: MIT AI Lab memo, late 1970¡äs.
// http://gurmeet.net/2008/08/05/fast-bit-counting-routines/
int CheckMIT(unsigned int n) 
{
   /* works for 32-bit numbers only    */
   /* fix last line for 64-bit numbers */
   register unsigned int tmp;

   tmp = n - ((n >> 1) & 033333333333)
           - ((n >> 2) & 011111111111);

   // the remainder of 63 for i equals the sum of 7 octal numbers which
   // consititutes the 32-bit integer.
   // For example, 03456 % 63 == 034 + 056
   return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}

My problem with the algorithm above is that I don't understand it enough to turn it into "uint64_t" (64 bits) version (despite the few instructions to do it).

So if one of you could help me with that task (or at least help me to understand), that would be awesome.

Here is LabSetInt64.h :

/*
 * LabSetInt64.h
 *
 *  Created on: Feb 20, 2013
 *      Author: golgauth
 */

#ifndef LABSETINT64_H_
#define LABSETINT64_H_

#include <ctype.h>
#include <stdio.h>
#include <string.h>

#include <iostream>
#include <stdint.h>
#include <limits>
#include <algorithm>

#include <LabTimeUtils.h>

using namespace std;


namespace elps {


// Lets assume we need at most 1M distinct indices in our sets
#define DEFAULT_SIZE_IN_BITS 1000000


class LabSetInt64
{
public:

    LabSetInt64();
    LabSetInt64(int sizeinbits);
    LabSetInt64(const LabSetInt64 &);
    ~LabSetInt64();


    void Clear();
    void Resize(int sizeinbits);
    void Set(int i);
    void UnSet(int i);
    void Set(int i, bool b);
    bool Find(int i);
    int  Cardinality();
    int  NextSetBit(int i);

    void Print();

    /**
     * Should have been "=" operator, but this overload is not compatible
     * with Cython, so we'll use "<<" instead.
     * @param
     * @return
     */
    const LabSetInt64 & operator << ( const LabSetInt64 & );
    LabSetInt64         operator ~  () const;
    LabSetInt64         operator +  ( const LabSetInt64 & );
    LabSetInt64         operator *  ( const LabSetInt64 & );
    LabSetInt64         operator -  ( const LabSetInt64 & );
    LabSetInt64         operator ^  ( const LabSetInt64 & );

private:
    uint64_t *data;
    int data_len;
    int bits_size;

    int popcount(uint64_t x);
    int nb_trailing_zeros(uint64_t v);
};


} /* namespace elps */
#endif /* LABSETINT64_H_ */

And here the LabSetInt64.cpp :

/*
 * LabSetInt64.cpp
 *
 *  Created on: Feb 20, 2013
 *      Author: golgauth
 */

#include "LabSetInt64.h"

namespace elps {


/** PUBLICS **/


LabSetInt64::LabSetInt64() : LabSetInt64(DEFAULT_SIZE_IN_BITS) {
}

LabSetInt64::LabSetInt64(int sizeinbits) {
    bits_size = sizeinbits;
    data_len = (bits_size + 63) / 64;
    data = new uint64_t[data_len];
}

LabSetInt64::LabSetInt64(const LabSetInt64& src) {
    bits_size = src.bits_size;
    data_len = src.data_len;
    data = new uint64_t[data_len];
    for( int i = 0; i < data_len; i++ )            // copy into this set
        data[i] = src.data[i];
}

LabSetInt64::~LabSetInt64() {
}


void LabSetInt64::Clear() {
    std::fill_n(data, data_len, 0);
}

void LabSetInt64::Resize(int sizeinbits) {
    bits_size = sizeinbits + 1;
    int size = (bits_size + 63) / 64;
    uint64_t *new_data = new uint64_t[size];

    memcpy( new_data, data, data_len * sizeof(uint64_t) );

    data_len = size;
    delete [] data;
    data = new_data;
}


void LabSetInt64::Set(int i) {
    data[i / 64] |= (1l << (i % 64));
}

void LabSetInt64::UnSet(int i) {
    data[i / 64] &= ~(1l << (i % 64));
}

void LabSetInt64::Set(int i, bool b) {
    if(b) Set(i); else UnSet(i);
}

bool LabSetInt64::Find(int i) {
    return ((data[i / 64]) & (1l << (i % 64)));           // binary AND;
}


int LabSetInt64::Cardinality() {
    int sum = 0;
    for(int i=0; i<data_len; i++)
        sum += popcount(data[i]);
    //sum += __builtin_popcount(data[i]);
    return sum;
}

int LabSetInt64::NextSetBit(int i) {
    int x = i / 64;
    long w = data[x];
    w >>= (i % 64);
    if (w != 0) {
        return i + nb_trailing_zeros(w);
    }
    ++x;
    for (; x < data_len; ++x) {
        if (data[x] != 0) {
            return x * 64 + nb_trailing_zeros(data[x]);
        }
    }
    return -1;
}


void LabSetInt64::Print() {
    int cur_id = this->NextSetBit(0);
    cout << "\nSet size : " << this->Cardinality() << endl;
    cout << "{ ";
    while (cur_id != -1)
    {
        cout << (cur_id) << " ";
        cur_id = this->NextSetBit(cur_id+1);
    }
    cout << "}" << endl;;
}


/** PRIVATES **/

//This is better when most bits in x are 0
//It uses 3 arithmetic operations and one comparison/branch per "1" bit in x.
int LabSetInt64::popcount(uint64_t x) {
    int count;
    for (count=0; x; count++)
        x &= x-1;
    return count;
}

// output: c will count v's trailing zero bits,
// so if v is 1101000 (base 2), then c will be 3
int LabSetInt64::nb_trailing_zeros(uint64_t v) {
    int c;
    if (v)
    {
        v = (v ^ (v - 1)) >> 1;         // Set v's trailing 0s to 1s and zero rest
        for (c = 0; v; c++) {
            v >>= 1;
        }
    }
    else
        c = 8 * sizeof(v);

    return c;
}


/** ***************************************************************************
*********************************   OPERATORS   *******************************
*******************************************************************************/

/** PUBLICS **/


/*******************************************************************************
 * TODO >> May be better to use  "DEFAULT_SIZE_IN_BITS" instead of "data_len"  *
 *         in all the operators !!!                                            *
 *                                                                             *
 * => For now, we assume that all the Sets are created with the default        *
 *    constructor (aka : bit_size = DEFAULT_SIZE_IN_BITS)                      *
 *******************************************************************************/


/**
 *  "operator = " assigns the right hand side to this set.
 *  returns: nothing
 */
const LabSetInt64 &LabSetInt64::operator << ( const LabSetInt64 &rhs )
{
    if( &rhs != this )                              // avoid self assignment
    {
        this->Resize(rhs.bits_size - 1);
        for( int i = 0; i < data_len; i++ )         // copy into this set
            data[i] = rhs.data[i];
    }
    return *this;                                   // enable x << y << z;
}


/**
 * "operator ~ " performs set complement operation (not).
 * returns: pointer to set
 */
LabSetInt64 LabSetInt64::operator ~ () const
{
    LabSetInt64 rv;
    for( int i = 0; i < data_len; i++ )
        rv.data[i] = ~data[i];                      // bitwise complement
    return rv;
}


/**
 * "operator + " performs set union operation (or).
 * returns: pointer to set
 */
LabSetInt64 LabSetInt64::operator + ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;
    for( int i = 0; i < data_len; i++ )
        rv.data[i] = data[i] | rhs.data[i];         // bitwise OR
    return rv;
}


/**
 * "operator * " performs set intersection operation (and).
 * returns: pointer to set
 */
LabSetInt64 LabSetInt64::operator * ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;
    for( int i = 0; i < data_len; i++ )
        rv.data[i] = data[i] & rhs.data[i];         // bitwise AND
    return rv;
}


/**
 * "operator - " performs set difference operation.
 * returns: pointer to set
 */
LabSetInt64 LabSetInt64::operator - ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;
    for( int i = 0; i < data_len; i++ )
        rv.data[i] = data[i] & ( ~rhs.data[i] );    // bitwise a AND ~b
    return rv;
}


/**
 * "operator ^ " performs set symmetric difference operation (xor).
 * returns: pointer to set
 */
LabSetInt64 LabSetInt64::operator ^ ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;
    for( int i = 0; i < data_len; i++ )
        rv.data[i] = data[i] ^ rhs.data[i];         // bitwise XOR
    return rv;
}


/** *****************************************************************************
*********************************************************************************/


} /* namespace elps */

For the rest of the implementation, I am quite happy with the performances, but once again, any good comments would be very very well appreciated.

Thanks a lot for your time.

EDIT : Well, after all I'm not completely convinced by the "sparse" solution since I need the union, intersection, difference features, which would be, I guess, a lot more costly than my "bitwise" version (need to iterate through a potentially huge amount of items), so I am still interested in getting some help on how to turn the above "MIT HAKMEM" algorithm to 64 bits. Many Thanks.

EDIT : This version contains some little imperfections. Please, better look at the source code given bellow.

回答1:

On his website, Russ Cox describes a very simple integer set data structure that has O(n) iteration (as opposed to O(U) where U is some maximum) and O(1) insert, delete and count. It uses no bit fiddling. It also tends to use more space, but can be extended to have O(1) amortized append (by simply using std::vector instead of plain arrays).

(I would paste sample code, but I think Cox's description is lucid enough and I'd just introduce bugs.)

回答2:

So, after having studied some other libraries (BitMagic and a few C++ and java implementations), and a lot of bechmarkings :

(I rejected the "sparse" option because of the need of fast combination operators)...

I went to some interesting improvements.

The main improvement is that I now maintain the cardinality on the fly (meaning that counting fast is no longer a big issue).

Improvements :

Iterating (which was my first issue) still a bit slower than (BitMagic), but OK. Comparing my performances with those of BitMagic confirmed that my class was not that bad, after all... But BitMagic's purpose is mainly "memory saving by usage of compression" and is much slower than boost (which I want to avoid here) when dealing with large vectors.
Improved significantly the way we calculate the cardinality.

Better iteration :

inline int nb_trailing_zeros(uint64_t v) {
    if (v & 1l) return 0;           // Fast check if no trailing zeros
                                    // Avoids a useless call to __builtin_ctzl()
    else return __builtin_ctzl(v);
}

Two improvements : using gcc's builtin function for counting trailing zeros and avoiding calling it when the block starts with a 1 improved significantly the performances while iterating.

Better counting :

usage of builtin sse4.2 popcount function :

inline int popcount(uint64_t x) {
//      int count;
//      for (count=0; x; count++)
//          x &= x-1;
//      return count;
    return _mm_popcnt_u64(x); //__builtin_popcount(x) is ok too
// Thanks to @cschwan for pointing me on this
}

"_mm_popcnt_u64" requires additional compilation options "-m64 -msse4.2" (not "__builtin_popcount")

added functions like :

int LabSetInt64::CU ( const LabSetInt64 &rhs ) {
    int sum = 0;
    for(int i=0; i<data_len; i++)
        sum += popcount(data[i] | rhs.data[i]);
    return sum;
}

Faster than making the operation, and then counting.

"optim counting" option (CNT_OPT mode) : See the new class version bellow

Slows down a little bit the combination operations (just a few, no real worries...)

Faster insert :

By inserting only if not found :

void LabSetInt64::Set(int i) {
    int n = i / 64;
    uint64_t shift1l = 1l << (i % 64);
    if ( !(data[n] & shift1l) ) {               // Not found
        data[n] |= shift1l;
        if (CNT_OPT) ++cardinality;
    }
}

It is faster to first check if the bit is already set, than setting it in any cases. (the most zeros we try to insert, the most we save time). The performances remain quite unchanged in case of setting a non-set bit.

Added some faster operators :

const LabSetInt64 & LabSetInt64::operator +=  ( const LabSetInt64 & rhs) {
    for( int i = 0; i < data_len; i++ )
        data[i] = data[i] | rhs.data[i];            // bitwise OR
    return *this;
}

A += B is now much faster than A = A + B (saves the instantiation of a new vector to be returned)

Added some convenience functions and operators : See the new class version bellow

[ Header ]

/**
 * If the set to 1, will be optimized for counting (cardinality updated on the fly).
 * WARNING : Optimizing for counting will slow down a little the following features :
 *              - Binary operation : (~, +, *, -, ^)
 */
#define CNT_OPT                 1

#define DEFAULT_SIZE_IN_BITS    1000000

//This is better when most bits in x are 0
inline unsigned int popcount(uint64_t x) {
    //      // It uses 3 arithmetic operations and one comparison/branch per "1" bit in x.
    //      int count;
    //      for (count=0; x; count++)
    //          x &= x-1;
    //      return count;
    return _mm_popcnt_u64(x); //__builtin_popcountl(x); //
}

/**
 * Counts trailing zeros starting from a given point
 * @param v Input to count trailing zero bits
 * @return c will count v's trailing zero bits,
             so if v is 1101000 (base 2), then c will be 3
 */
inline int nb_trailing_zeros(uint64_t v) {
    if (v & 1l) return 0;           // Fast check if no trailing zeros
                                    // Avoids a useless call to __builtin_ctzl()
    else return __builtin_ctzl(v);
}

class LabSetInt64
{
public:

    LabSetInt64();
    /**
     * Instantiates a bitset with a given maximum size.
     * @param sizeinbits Maximum size of the population
     */
    LabSetInt64(unsigned int sizeinbits);
    LabSetInt64(const LabSetInt64 &);

    virtual ~LabSetInt64();


    void Clear();
    void Resize(unsigned int sizeinbits);
    /**
     * WARNING : no check is perform whether i is in the correct range :
     *           i MUST be in [0, size_in_bits-1]
     */
    void Set(int i);
    void UnSet(int i);
    //    void Set(int i, bool b);
    int  Cardinality();

    bool Find(int i);
    void Print();

    int GetFirst();
    int ChooseOne();
    int GetAt(int n);

    const LabSetInt64 & operator = ( const LabSetInt64 & );

    /**
     * All the following operators assume that left and right operands
     * have the same size "size_in_bits" (enough for our needs,
     * since memory usage is not really a problem for us).
     */
    LabSetInt64         operator ~  () const;                   /** Inversion      **/
    LabSetInt64         operator +  ( const LabSetInt64 & );    /** Union          **/
    LabSetInt64         operator *  ( const LabSetInt64 & );    /** Intersection   **/
    LabSetInt64         operator -  ( const LabSetInt64 & );    /** Difference     **/
    LabSetInt64         operator ^  ( const LabSetInt64 & );    /** Symmetrical D. **/

    /**
     * Comparison.
     */
    bool                operator == ( const LabSetInt64 & ) const;
    bool                operator != ( const LabSetInt64 & ) const;

    /**
     * Convenient, moreover :
     * (A += B) is actually a lot faster than (A = A + B)...
     * [ No intermediary storage ]
     */
    const LabSetInt64 & operator +=  ( const LabSetInt64 & );
    const LabSetInt64 & operator *=  ( const LabSetInt64 & );
    const LabSetInt64 & operator -=  ( const LabSetInt64 & );
    const LabSetInt64 & operator ^=  ( const LabSetInt64 & );

    const LabSetInt64 & INV();

    /**
     * Gets the population count of the union.
     * Faster than realizing the union, then calling Cardinality().
     * [ No intermediary storage ]
     */
    int CU  ( const LabSetInt64 & );
    int CI  ( const LabSetInt64 & );
    int CD  ( const LabSetInt64 & );
    int CSD ( const LabSetInt64 & );

    /**
     * Basic iterator facility.
     */
    class iter {
    public:
        iter(LabSetInt64 & bv) { pos_ = -1; bv_ = &bv; }

        void reset() { pos_ = -1; }
        int  end()   { return -2; }
        int  next()  { pos_ = bv_->next_set_bit(pos_); return pos_; }

    private:
        int           pos_;
        LabSetInt64 * bv_;
    };

private:

    uint64_t *   data_;
    unsigned int data_len_;
    unsigned int size_in_bits_;
    unsigned int cardinality_;

    void init_bitset(unsigned int sizeinbits);
    int  calc_cardinality();
    int  next_set_bit(int i);
};

[ Implementation ]

LabSetInt64::LabSetInt64() {
    init_bitset(DEFAULT_SIZE_IN_BITS);
}

LabSetInt64::LabSetInt64(unsigned int sizeinbits) {
    init_bitset(sizeinbits);
}

void LabSetInt64::init_bitset(unsigned int sizeinbits) {
    // +1 (the last higher weighted bit is 0 : allows to stop when iterating)
    // See example in "Print()" function
    size_in_bits_ = sizeinbits + 1;
    data_len_ = (size_in_bits_ + 63) / 64;
    data_ = new uint64_t[data_len_];
    if (CNT_OPT) cardinality_ = 0;
    Clear();                        // Start filled with 0s. // Necessary...    
}

LabSetInt64::LabSetInt64(const LabSetInt64& src) {
    size_in_bits_ = src.size_in_bits_;
    data_len_ = src.data_len_;
    data_ = new uint64_t[data_len_];
    memcpy( data_, src.data_, src.data_len_ * sizeof(uint64_t) );
    cardinality_ = src.cardinality_;
}

LabSetInt64::~LabSetInt64() {
    if (data_ != NULL) delete[] data_;
}

void LabSetInt64::Clear() {
    std::fill_n(data_, data_len_, 0);

    if (CNT_OPT) cardinality_ = 0;
}


void LabSetInt64::Resize(unsigned int sizeinbits) {
    bool truncated;
    unsigned int new_sizeinbits = sizeinbits + 1;

    if (size_in_bits_ != new_sizeinbits)
    {
        truncated = (size_in_bits_ > new_sizeinbits);
        size_in_bits_ = new_sizeinbits;
        unsigned int new_size = (size_in_bits_ + 63) / 64;

        if (new_size != data_len_)      // Resize only if storage size changed
        {
            uint64_t *new_data = new uint64_t[new_size];
            if (!truncated)             // Clear : required only if extended
                std::fill_n(new_data, new_size, 0);

            memcpy( new_data, data_, std::min(data_len_, new_size) * sizeof(uint64_t) );

            data_len_ = new_size;
            delete [] data_;
            data_ = new_data;
        }

        //--
        if (truncated) {
            // Clear the bits beyond the maximum size
            unsigned int begin = sizeinbits;
            unsigned int end = data_len_ * 64;
            for ( unsigned int i = begin; i<end; ++i ) UnSet(i);
            // Update cardinality
            if (CNT_OPT) cardinality_ = calc_cardinality();
        }
    }
}


void LabSetInt64::Set(int i) {

    int n = i / 64;
    uint64_t shift1l = 1l << (i % 64);
    if ( !(data_[n] & shift1l) ) {              // Not found
        data_[n] |= shift1l;
        if (CNT_OPT) ++cardinality_;
    }
}

void LabSetInt64::UnSet(int i) {

    int n = i / 64;
    uint64_t shift1l = 1l << (i % 64);
    if ( data_[n] & shift1l ) {                 // Found
        data_[n] &= ~shift1l;
        if (CNT_OPT) --cardinality_;
    }
}

//void LabSetInt64::Set(int i, bool b) {
//  if(b) Set(i); else UnSet(i);
//}

bool LabSetInt64::Find(int i) {
    return ((data_[i / 64]) & (1l << (i % 64)));
}


int LabSetInt64::Cardinality() {
    if (CNT_OPT) return cardinality_;
    else return calc_cardinality();
}


int LabSetInt64::calc_cardinality() {
    unsigned int sum = 0;

    for(unsigned int i=0; i<data_len_; i++)
        sum += popcount(data_[i]);
        //sum += bitcount64_4way(data[i], data[i+1], data[i+2], data[i+3]);

    return sum;
}

int LabSetInt64::next_set_bit(int i) {
    ++i;
    unsigned int x = i / 64;
    uint64_t w = data_[x];
    w >>= (i % 64);
    if (w != 0) {
        return i + nb_trailing_zeros(w);
    }
    ++x;
    for (; x < data_len_; ++x) {
        if (data_[x] != 0) {
            return x * 64 + nb_trailing_zeros(data_[x]);
        }
    }
    return -2;
}


void LabSetInt64::Print() {

    cout << "\nSet size : " << this->Cardinality() << endl;
    cout << "{ ";
    int cnt = 0;
    iter it(*this);
    for (int val = it.next(); val != it.end(); val = it.next())
    {
        cout << (val) << " ";
        if ((++cnt) % 30 == 0) cout << endl;
    }
    cout << "}" << endl;
}



int LabSetInt64::GetFirst() {
    return this->next_set_bit(-1);
}

int LabSetInt64::ChooseOne() {
    // Get lucky...
    int id = rand() % this->size_in_bits_;
    // Otherwise...
    if (!Find(id))
        id = GetAt(rand() % this->Cardinality());
    return id;
}

int LabSetInt64::GetAt(int n) {

    int cnt = 0, val;
    iter it(*this);
    while ((val = it.next()) != it.end() && cnt++ != n) {  }

    return val;
}

// *************************************************



/*----------------------------------------------------------------------------*
 **  "operator = " assigns the right hand side to this set.
 **
 **  returns: nothing
 */
const LabSetInt64 &LabSetInt64::operator = ( const LabSetInt64 &rhs )
{
    if( &rhs != this )                              // avoid self assignment
    {
        this->Resize(rhs.size_in_bits_ - 1);
        memcpy( data_, rhs.data_, rhs.data_len_ * sizeof(uint64_t) );

        if (CNT_OPT) cardinality_ = rhs.cardinality_;
    }
    return *this;                                   // enable x = y = z;
}


/*----------------------------------------------------------------------------*
**  "operator ~ " performs set complement operation (not).
**
**  returns: pointer to set
*/
LabSetInt64 LabSetInt64::operator ~ () const
{
    LabSetInt64 rv;

    unsigned int i;
    for( i = 0; i < data_len_; i++ ) {
        rv.data_[i] = ~data_[i];                        // bitwise complement
        if (CNT_OPT) rv.cardinality_ += popcount(rv.data_[i]);
    }

    rv.size_in_bits_ = size_in_bits_;
    // Clear the bits beyond the maximum size
    unsigned int begin = rv.size_in_bits_ - 1;
    unsigned int end = data_len_ * 64;
    for ( i = begin; i<end; ++i ) rv.UnSet(i);

    return rv;
}


/*----------------------------------------------------------------------------*
**  "operator + " performs set union operation (or).
**
**  returns: pointer to set
*/
LabSetInt64 LabSetInt64::operator + ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;

    for( unsigned int i = 0; i < data_len_; i++ ) {
        rv.data_[i] = data_[i] | rhs.data_[i];            // bitwise OR
        if (CNT_OPT) rv.cardinality_ += popcount(rv.data_[i]);
    }

    return rv;
}


/*----------------------------------------------------------------------------*
**  "operator * " performs set intersection operation (and).
**
**  returns: pointer to set
*/
LabSetInt64 LabSetInt64::operator * ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;

    for( unsigned int i = 0; i < data_len_; i++ ) {
        rv.data_[i] = data_[i] & rhs.data_[i];        // bitwise AND
        if (CNT_OPT) rv.cardinality_ += popcount(rv.data_[i]);
    }

    return rv;
}


/*----------------------------------------------------------------------------*
**  "operator - " performs set difference operation.
**
**  returns: pointer to set
*/
LabSetInt64 LabSetInt64::operator - ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;

    for( unsigned int i = 0; i < data_len_; i++ ) {
        rv.data_[i] = data_[i] & ( ~rhs.data_[i] );       // bitwise a AND ~b
        if (CNT_OPT) rv.cardinality_ += popcount(rv.data_[i]);
    }

    return rv;
}


/*----------------------------------------------------------------------------*
**  "operator ^ " performs set symmetric difference operation (xor).
**
**  returns: pointer to set
*/
LabSetInt64 LabSetInt64::operator ^ ( const LabSetInt64 &rhs )
{
    LabSetInt64 rv;

    for( unsigned int i = 0; i < data_len_; i++ ) {
        rv.data_[i] = data_[i] ^ rhs.data_[i];            // bitwise XOR
        if (CNT_OPT) rv.cardinality_ += popcount(rv.data_[i]);
    }

    return rv;
}



bool LabSetInt64::operator == ( const LabSetInt64 &rhs ) const {

    if ((CNT_OPT) && (cardinality_ != rhs.cardinality_)) return false;

    for( unsigned int i = 0; i < data_len_; i++ )
    {
        if (data_[i] != rhs.data_[i]) {
            return false;
        }
    }
    return true;
}

bool LabSetInt64::operator != ( const LabSetInt64 &rhs ) const {

    return !(*this == rhs);
}



const LabSetInt64 & LabSetInt64::operator +=  ( const LabSetInt64 & rhs) {

    if (CNT_OPT) cardinality_ = 0;
    for( unsigned int i = 0; i < data_len_; i++ ) {
        data_[i] = data_[i] | rhs.data_[i];
        if (CNT_OPT) cardinality_ += popcount(data_[i]);
    }

    return *this;
}

const LabSetInt64 & LabSetInt64::operator *=  ( const LabSetInt64 & rhs) {

    if (CNT_OPT) cardinality_ = 0;
    for( unsigned int i = 0; i < data_len_; i++ ) {
        data_[i] = data_[i] & rhs.data_[i];
        if (CNT_OPT) cardinality_ += popcount(data_[i]);
    }

    return *this;
}

const LabSetInt64 & LabSetInt64::operator -=  ( const LabSetInt64 & rhs) {

    if (CNT_OPT) cardinality_ = 0;
    for( unsigned int i = 0; i < data_len_; i++ ) {
        data_[i] = data_[i] & ( ~rhs.data_[i] );
        if (CNT_OPT) cardinality_ += popcount(data_[i]);
    }

    return *this;
}

const LabSetInt64 & LabSetInt64::operator ^=  ( const LabSetInt64 & rhs) {

    if (CNT_OPT) cardinality_ = 0;
    for( unsigned int i = 0; i < data_len_; i++ ) {
        data_[i] = data_[i] ^ rhs.data_[i];
        if (CNT_OPT) cardinality_ += popcount(data_[i]);
    }

    return *this;
}


// *************************************************

const LabSetInt64 & LabSetInt64::INV() {

    unsigned int i;

    if (CNT_OPT) cardinality_ = 0;
    for( i = 0; i < data_len_; i++ ) {
        data_[i] = ~data_[i];
        if (CNT_OPT) cardinality_ += popcount(data_[i]);
    }

    // Clear the bits beyond the maximum size
    unsigned int begin = size_in_bits_ - 1;
    unsigned int end = data_len_ * 64;
    for ( i = begin; i<end; ++i ) UnSet(i);

    return *this;
}

// *************************************************


int LabSetInt64::CU ( const LabSetInt64 & rhs ) {
    int sum = 0;
    for( unsigned int i=0; i < data_len_; i++ )
        sum += popcount(data_[i] | rhs.data_[i]);
    return sum;
}

int LabSetInt64::CI  ( const LabSetInt64 & rhs ) {
    int sum = 0;
    for( unsigned int i=0; i < data_len_; i++ )
        sum += popcount(data_[i] & rhs.data_[i]);
    return sum;
}

int LabSetInt64::CD  ( const LabSetInt64 & rhs ) {
    int sum = 0;
    for( unsigned int i=0; i < data_len_; i++ )
        sum += popcount(data_[i] & (~rhs.data_[i]));
    return sum;
}

int LabSetInt64::CSD ( const LabSetInt64 & rhs ) {
    int sum = 0;
    for( unsigned int i=0; i < data_len_; i++ )
        sum += popcount(data_[i] ^ rhs.data_[i]);
    return sum;
}