Fast code for searching bit-array for contiguous s

2019-01-25 13:24发布

问题:

Is there some reasonably fast code out there which can help me quickly search a large bitmap (a few megabytes) for runs of contiguous zero or one bits?

By "reasonably fast" I mean something that can take advantage of the machine word size and compare entire words at once, instead of doing bit-by-bit analysis which is horrifically slow (such as one does with vector<bool>).

It's very useful for e.g. searching the bitmap of a volume for free space (for defragmentation, etc.).

回答1:

Windows has an RTL_BITMAP data structure one can use along with its APIs.

But I needed the code for this sometime ago, and so I wrote it here (warning, it's a little ugly):
https://gist.github.com/3206128

I have only partially tested it, so it might still have bugs (especially on reverse). But a recent version (only slightly different from this one) seemed to be usable for me, so it's worth a try.

The fundamental operation for the entire thing is being able to -- quickly -- find the length of a run of bits:

long long GetRunLength(
    const void *const pBitmap, unsigned long long nBitmapBits,
    long long startInclusive, long long endExclusive,
    const bool reverse, /*out*/ bool *pBit);

Everything else should be easy to build upon this, given its versatility.

I tried to include some SSE code, but it didn't noticeably improve the performance. However, in general, the code is many times faster than doing bit-by-bit analysis, so I think it might be useful.

It should be easy to test if you can get a hold of vector<bool>'s buffer somehow -- and if you're on Visual C++, then there's a function I included which does that for you. If you find bugs, feel free to let me know.



回答2:

I can't figure how to do well directly on memory words, so I've made up a quick solution which is working on bytes; for convenience, let's sketch the algorithm for counting contiguous ones:

Construct two tables of size 256 where you will write for each number between 0 and 255, the number of trailing 1's at the beginning and at the end of the byte. For example, for the number 167 (10100111 in binary), put 1 in the first table and 3 in the second table. Let's call the first table BBeg and the second table BEnd. Then, for each byte b, two cases: if it is 255, add 8 to your current sum of your current contiguous set of ones, and you are in a region of ones. Else, you end a region with BBeg[b] bits and begin a new one with BEnd[b] bits. Depending on what information you want, you can adapt this algorithm (this is a reason why I don't put here any code, I don't know what output you want).

A flaw is that it does not count (small) contiguous set of ones inside one byte ...

Beside this algorithm, a friend tells me that if it is for disk compression, just look for bytes different from 0 (empty disk area) and 255 (full disk area). It is a quick heuristic to build a map of what blocks you have to compress. Maybe it is beyond the scope of this topic ...



回答3:

Sounds like this might be useful:

http://www.aggregate.org/MAGIC/#Population%20Count%20%28Ones%20Count%29 and http://www.aggregate.org/MAGIC/#Leading%20Zero%20Count

You don't say if you wanted to do some sort of RLE or to simply count in-bytes zeros and one bits (like 0b1001 should return 1x1 2x0 1x1).

A look up table plus SWAR algorithm for fast check might gives you that information easily. A bit like this:

byte lut[0x10000] = { /* see below */ };
for (uint * word = words; word < words + bitmapSize; word++) {
   if (word == 0 || word == (uint)-1) // Fast bailout
   {
       // Do what you want if all 0 or all 1
   }
   byte hiVal = lut[*word >> 16], loVal = lut[*word & 0xFFFF];
   // Do what you want with hiVal and loVal

The LUT will have to be constructed depending on your intended algorithm. If you want to count the number of contiguous 0 and 1 in the word, you'll built it like this:

  for (int i = 0; i < sizeof(lut); i++) 
     lut[i] = countContiguousZero(i); // Or countContiguousOne(i)
  // The implementation of countContiguousZero can be slow, you don't care
  // The result of the function should return the largest number of contiguous zero (0 to 15, using the 4 low bits of the byte, and might return the position of the run in the 4 high bits of the byte
  // Since you've already dismissed word = 0, you don't need the 16 contiguous zero case.