2bit bit-fields array effects on performance and c

I am in need of a 2bit array, I am not concerned with saving memory at all, but I am concerned with minimizing cache misses and maximizing cache efficiency. Using an array of bools will use 4 times more memory, which means for every usable chunk of data in the cache, there will be 3 that are not used. So technically, I can get 3 times better cache consistency if I use bitfields.

The plan is to implement it as an array of bytes, divided into 4 equal bitfields, and use the div function to be able to get the integral quotient and remainder, possibly in a single clock, and use those to access the right index and right bitfield.

The array I needs is about 10000 elements long, so it will make for a significantly denser packed data, using 2 actual bits will allow for the entire array to fit in L1 cache, while using a byte array this will not be possible.

So my question is whether someone can tell me if this is a good idea in a performance oriented task, so I know if it is worth to go forth and implement a 2bit array? And surely, the best way to know is profiling, but any information in advance may be useful and will be appreciated.

标签： c++ performance caching bit-fields memory-efficient

2条回答

不美不萌又怎样

2楼-- · 2019-05-17 08:56

With 10000 elements, on a modern processor, it should fit nicely in memory as bytes (10KB), so I wouldn't worry too much about it, unless you want this to run on some very tiny microprocessor with a cache that is much smaller than the typical 16-32KB L1 caches that modern CPU's have.

Of course, you may well want to TEST the performance with different solutions, if you think this is an important part of your code from a performance perspective [as measured from your profiling that you've already done before you start optimising, right?].

0人赞添加讨论(0) 举报

Anthone

3楼-- · 2019-05-17 08:58

It's not clear to me that this will result in a performance gain. Accessing each field will require several instructions ((data[i / 4] >> 2 * (i % 4)) & 0x03), and a lot of modern processors have an L3 cache which would hold the entire array with one byte per entry. Whether the extra cost in execution time will be greater or less than the difference in caching is hard to say; you'll have to profile to know exactly.

If you can organize your algorithms to work a byte (or even a word) at a time, the cost of access may be much less. Iterating over the entire array, for example:

for ( int i = 0; i < 10000; i += 4 ) {
    unsigned char w1 = data[ i / 4 ];
    for ( int j = 0; j < 4; ++ j ) {
        unsigned char w2 = w1 & 0x03;
        //  w2 is entry i + j...
        w1 >>= 2;
    }
}

could make a significant difference. Most compilers will be able to keep w1 and w2 in registers, meaning you'll only have 1/4 as many memory accesses. Packing withunsigned int` would probably be even faster.

0人赞添加讨论(0) 举报

2bit bit-fields array effects on performance and c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间