I just want to find some fastest set bits count function in the php.
For example, 0010101 => 3, 00011110 => 4
I saw there is good Algorithm that can be implemented in c++.
How to count the number of set bits in a 32-bit integer?
Is there any php built-in function or fastest user-defined function?
You can try to apply a mask with a binary AND, and use shift to test bit one by one, using a loop that will iterate 32 times.
function getBitCount($value) {
$count = 0;
while($value)
{
$count += ($value & 1);
$value = $value >> 1;
}
return $count;
}
You can also easily put your function into PHP style
function NumberOfSetBits($v)
{
$c = $v - (($v >> 1) & 0x55555555);
$c = (($c >> 2) & 0x33333333) + ($c & 0x33333333);
$c = (($c >> 4) + $c) & 0x0F0F0F0F;
$c = (($c >> 8) + $c) & 0x00FF00FF;
$c = (($c >> 16) + $c) & 0x0000FFFF;
return $c;
}
My benchmarking code
start_benchmark();
for ($i = 0; $i < 1000000; $i++) {
getBitCount($i);
}
end_benchmark();
start_benchmark();
for ($i = 0; $i < 1000000; $i++) {
NumberOfSetBits($i);
}
end_benchmark();
start_benchmark();
for ($i = 0; $i < 1000000; $i++) {
substr_count(decbin($i), '1');
}
end_benchmark();
Benchmarking result:
benchmark (NumberOfSetBits()) : 1.429042 milleseconds
benchmark (substr_count()) : 1.672635 milleseconds
benchmark (getBitCount()): 10.464981 milleseconds
I think NumberOfSetBits() and substr_count() are best.
Thanks.
I could figure out a few ways to but not sure which one would be the fastest :
- use substr_count()
- replace all none '1' characters by '' and then use strlen()
- use preg_match_all()
PS : if you start with a integer these examples would involve using decbin() first.
There are a number of other ways; but for a decimal 32 bit integer, NumberOfSetBits
is definitely the fastest.
I recently stumbled over Brian Kernighan´s algorithm, which has O(log(n))
instead of most of the others having O(n)
. I don´t know why it´s not appearing that fast here; but it still has a measurable advantage over all other non-specialized functions.
Of course, nothing can beat NumberOfSetBits
with O(1)
.
my benchmarks:
function getBitCount($value) { $count = 0; while($value) { $count += ($value & 1); $value = $value >> 1; } return $count; }
function getBitCount2($value) { $count = 0; while($value) { if ($value & 1)$count++; $value >>= 1; } return $count; }
// if() instead of +=; >>=1 instead of assignment: sometimes slower, sometimes faster
function getBitCount2a($value) { for($count = 0;$value;$value >>= 1) if($value & 1)$count ++; return $count; }
// for instead of while: sometimes slower, sometimes faster
function getBitCount3($value) { for($i=1,$count=0;$i;$i<<=1) if($value&$i)$count++; return $count; }
// shifting the mask: incredibly slow (always shifts all bits)
function getBitCount3a($value) { for($i=1,$count=0;$i;$i<<=1) !($value&$i) ?: $count++; return $count; }
// with ternary instead of if: even slower
function NumberOfSetBits($v) {
// longest (in source code bytes), but fastest
$c = $v - (($v >> 1) & 0x55555555); $c = (($c >> 2) & 0x33333333) + ($c & 0x33333333);
$c = (($c >> 4) + $c) & 0x0F0F0F0F; $c = (($c >> 8) + $c) & 0x00FF00FF;
$c = (($c >> 16) + $c) & 0x0000FFFF; return $c;
}
function bitsByPregReplace($n) { return strlen(preg_replace('_0_','',decbin($n))); }
function bitsByNegPregReplace($n) { return strlen(preg_replace('/[^1]/','',decbin($n))); }
function bitsByPregMatchAll($n) { return preg_match_all('/1/',decbin($n)); }
function bitsBySubstr($i) { return substr_count(decbin($i), '1'); }
function bitsBySubstrInt($i) { return substr_count(decbin($i), 1); }
// shortest (in source code bytes)
function bitsByCountChars($n){ return count_chars(decbin($n))[49]; }
// slowest by far
function bitsByCountChars1($n) { return count_chars(decbin($n),1)[49]; }
// throws a notice for $n=0
function Kernighan($n) { for(;$n;$c++)$n&=$n-1;return$c; }
// Brian Kernighan’s Algorithm
function benchmark($function)
{
gc_collect_cycles();
$t0=microtime();
for($i=1e6;$i--;) $function($i);
$t1=microtime();
$t0=explode(' ', $t0); $t1=explode(' ', $t1);
echo ($t1[0]-$t0[0])+($t1[1]-$t0[1]), " s\t$function\n";
}
benchmark('getBitCount');
benchmark('getBitCount2');
benchmark('getBitCount2a');
benchmark('getBitCount3');
benchmark('getBitCount3a');
benchmark('NumberOfSetBits');
benchmark('bitsBySubstr');
benchmark('bitsBySubstrInt');
benchmark('bitsByPregReplace');
benchmark('bitsByPregMatchAll');
benchmark('bitsByCountChars');
benchmark('bitsByCountChars1');
benchmark('decbin');
banchmark results (sorted)
> php count-bits.php
2.286831 s decbin
1.364934 s NumberOfSetBits
3.241821 s Kernighan
3.498779 s bitsBySubstr*
3.582412 s getBitCount2a
3.614841 s getBitCount2
3.751102 s getBitCount
3.769621 s bitsBySubstrInt*
5.806785 s bitsByPregMatchAll*
5.748319 s bitsByCountChars1*
6.350801 s bitsByNegPregReplace*
6.615289 s bitsByPregReplace*
13.863838 s getBitCount3
16.39626 s getBitCount3a
19.304038 s bitsByCountChars*
Those are the numbers from one of my runs (with PHP 7.0.22); others showed different order within the 3.5 seconds group. I can say that - on my machine - four of those five are pretty equal, and bitsBySubstrInt
is always a little slower due to the typecasts.
Most other ways require a decbin (which mostly takes longer than the actual counting; I marked them with a *
in the benchmark results); only BitsBySubstr
would get close to the winner without that gammy leg.
I find it noticeable that you can make count_chars
3 times faster by limiting it to only existing chars. Seems like array indexing needs quite some time.
edit:
- added another
preg_replace
version
- fixed
preg_match_all
version
- added Kernighan´s algorithm (fastest algorithm for arbitrary size integers)
- added garbage collection to benchmarking function
- reran benchmarks