I'm currently using OpenCV's ORB features extractor and I did notice the strange (at least for me) way the ORB-descriptor is stored (it is basically a BRIEF-32 with a modification that is not relevant to my question). As some of you know ORB takes the keypoints extracted using a modified FAST-9 (circle radius = 9 pixels; also stores orientation of the keypoint) and uses those with a modified BRIEF-32 descriptor to store the feature that the keypoint represents.
BRIEF (ORB version) works as follows: we take a 31x31 pixels patch (represents a feature) and create a bunch of random 5x5 pixels test points. Then we take pairs of those points and evaluate their intensities resulting in a binary decision (0 or 1) based on whether the intensity of the first point in the pair is greater or less equal than the intensity of the second one. Then we take all those bits and use a basic sum-formula to build a binary string of length n (for BRIEF-32 we have 32 bytes * 8 = 256 bit long binary string):
SUM(2(i-1)*bit_pair_test)
where the bit_pair_test is the bit value that we've calculated from the test for a pair of test points. The final result is something like (for a set of binary tests (...,0,1,0,1,1)):
(20*1) + (21*1) + (22*0) + (23*1) + (24*0) + ...
Now the way OpenCV's ORB stores those bitstrings is what is a mystery to me. If we look at the matrix that contains the descriptors for a whole image with each row being a single descriptor for a single keypoint we can see that each descriptor has 32 8-bit numbers, which altogether results in those 256 bits that BRIEF-32 uses to store the information. I don't understand why we split those 256 bits in 32 bytes. The official documentation (http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_brief/py_brief.html) only says that OpenCV stores such descriptors in bytes, however it does not explain why it does that. I've considered three possibilities without excluding the possibility that some combination of those might be the answer:
- Some storage technique that I just can't see
- Some performance issue with computing Hamming distance for binary string that long (256 bits in our case)
- Optimization on the matching process - the matching basically compares the descriptor of a keypoint from one image to the descriptor of a keypoint in a second image. Because we have binary strings Hamming distance is the obvious choice here. It might be that somehow each of those 32 sub-strings is compared to their counterparts in the descriptor of the other keypoint in the second image (sub-string at position 0 (keypoint X, image 1) with sub-string at position 0 (keypoint Y, image 2). Finally it might be that OpenCV says: "Okay, we have 80% match rate of the descriptor since approx. 26 of all sub-string are the same in both descriptors) so we have a winner." However I was unable to find any evidence to confirm that.
PS: You can read the paper on ORB here and the paper on BRIEF here.