C fastest way to compare two bitmaps

2019-06-26 17:47发布

There are two arrays of bitmaps in the form of char arrays with millions of records. What could be fastest way to compare them using C.

I can imagine to use bitwise operator xor 1 byte at a time in a for loop.

Important point about bitmaps:

  • 1% to 10% of times algorithm is run, bitmaps can differ. Most of the time they will be same. When hey can differ, they can as much as 100%. There is high probability of change of bits in continuous streak.
  • Both bitmaps are of same length.

Aim:

  • Check do they differ and if yes then where.
  • Be correct every time (probability of detecting error if there is one should be 1).

1条回答
虎瘦雄心在
2楼-- · 2019-06-26 18:36

This answer assumes you mean 'bitmap' as a sequence of 0/1 values rather than 'bitmap image format'

If you simply have two bitmaps of the same length and wish to compare them quickly, memcmp() will be effective as someone suggested in the comments. You could if you want try using SSE type optimizations, but these are not as easy as memcmp(). memcmp() is assuming you simply want to know 'they are different' and nothing more.

If you want to know how many bits they are different by, e.g. 615 bits differ, then again you have little option except to XOR every byte and count the number of differences. As others have noted, you probably want to do this more at 32/64 or even 256 bits at a time, depending on your platform. However, if the arrays are millions of bytes long, then the biggest delay (with current CPUs) will be the time to transfer main memory to the CPU, and it wont matter terribly what the CPU does (lots of caveats here)

If you question is more asking about comparing A to B, but really you are doing this lots of times, such as A to B and C,D,E etc, then you can do a couple of things

  • A. Store a checksum of each array and first compare the checksums, if these are the same then there is a high chance the arrays are the same. Obviously there is a risk here that checksums can be equal but the data can differ, so make sure that a false result in this case will not have dramatic side effects. And, if you cannot withstand false results, do not use this technique.
  • B. if the arrays have structure, such as they are image data, then leverage specific tools for this, how is beyond this answer to explain.
  • C. If the image data can be compressed effectively, then compress each array and compare using the compressed form. If you use ZIP type of compression you cannot tell directly from zip how many bits differ, but other techniques such as RLE can be effective to quickly count bit differences (but are a lot of work to build and get correct and fast)
  • D. If the risk with (a) is acceptable, then you can checksum each chunk of say 262144 bits, and only count differences where checksums differ. This heavily reduces main memory access and will go lots faster.

All of the options A..D are about reducing main memory access as this is the nub of any performance gain (for problem as stated)

查看更多
登录 后发表回答