I have a couple uint8_t arrays in my c code, and I'd like to compare an arbitrary sequence bits from one with another. So for example, I have bitarray_1 and bitarray_2, and I'd like to compare bits 13 - 47 from bitarray_1 with bits 5-39 of bitarray_2. What is the most efficient way to do this?
Currently it's a huge bottleneck in my program, since I just have a naive implementation that copies the bits into the beginning of a new temporary array, and then uses memcmp on them.
bits 13 - 47 of
bitarray_1
are the same as bits 5 - 39 ofbitarray_1 + 1
.Compare the first 3 bits (5 - 7) with a mask and the other bits (8 - 39) with
memcmp()
.Rather than shift and copy the bits, maybe representing them differently is faster. You have to measure.
You can (and should) limit the copy to the minimum possible.
I have no idea if it's faster, but it wouldn't surprise me if it was. Again, you have to measure.
Here is my unoptimized bit sequence comparison function:
EDIT: Here is my (untested) bitstring equality test function:
I made a simple measurement tool to see how fast is it:
Result:
EDIT2: last_bits() changed.
What about writing the function that will calculate the offsets from both arrays, apply the mask, shift the bits and store the result to the int so you may compare them. If the bits count (34 in your example) exceeds the length of the int - recurse or loop.
Sorry, the example will be pain in the ass.
three words: shift, mask and xor.
shift to get the same memory alignment for both bitarray. If not you will have to shift one of the arrays before comparing them. Your exemple is probably misleading because bits 13-47 and 5-39 have the same memory alignment on 8 bits addresses. This wouldn't be true if you were comparing say bits 14-48 with bits 5-39.
Once everything is aligned and exceeding bits cleared for table boundaries a xor is enough to perform the comparison of all the bits at once. Basically you can manage to do it with just one memory read for each array, which should be pretty efficient.
If memory alignment is the same for both arrays as in your example memcmp and special case for upper and lower bound is probably yet faster.
Also accessing array by uint32_t (or uint64_t on 64 bits architectures) should also be more efficient than accessing by uint8_t.
The principle is simple but as Andrejs said the implementation is not painless...
Here is how it goes (similarities with @caf proposal is no coincidence):
}
All possible optimisations are not yet used. One promising one would be to use larger chunks of data (64 bits or 32 bits at once instead of 8), you could also detect cases where offset are synchronised for both arrays and in such cases use a memcmp instead of the main loop, replace modulos % 8 by logical operators & 7, replace '/ 8' by '>> 3', etc., have to branches of code instead of swapping s1 and s2, etc, but the main purpose is achieved : only one memory read and not memory write for each array item hence most of the work can take place inside processor registers.
The easiest way to do this is to convert the more complex case into a simpler case, then solve the simpler case.
In the following code,
do_compare()
solves the simpler case (where the sequences are never offset by more than 7 bits,s1
is always offset as much or more thans2
, and the length of the sequence is non-zero). Thecompare_bit_sequence()
function then takes care of converting the harder case to the easier case, and callsdo_compare()
to do the work.This just does a single-pass through the bit sequences, so hopefully that's an improvement on your copy-and-memcmp implementation.
To do the comparison in your example, you'd call:
(Note that I am numbering the bits from zero, and assuming that the bitarrays are laid out little-endian, so this will start the comparison from the sixth-least-significant bit in bitarray2[0], and the sixth-least-signifcant bit in bitarray1[1]).