i have a table with a column vector of type bit(2000). how does the db engine handle operations AND and OR over this values? does it simply divide into 32bit chunks (or 64, respectively) and then compares each chunk separately and in the end simply concats the results together? or does it handle simply as two strings?
my point is to predict, which use case would be faster. i got a key-value data (user-item).
userID | itemID
U1 | I1
U1 | Ix
Un | Ij
for each user i want to calculate a list of n nearest neighbors (using jaccard index, for example).
select my_jaccard(select itemID from table where userID=U1,select itemID from table where userID=U2)
my solution - i parsed the input data into a table of user-vector, where the vector is of type bit(2000) with 1's on the position representing the particular item.
userID | vector
U1 | 00.......01
U1 | 0..1.....00
Un | 00..1..1..0
upon this table i simply do
select vector1&vector2
the point is that each user has at most only 10 records for all items, i.e. the vector has maximum of 10 active bits. i think, that parsing the whole bitvector just to find the active bits needs more computational resources, than simply comparing those 10 values of user1 with 10 values of user2 each with another.
is it faster to use long bit-vectors which have very few bits set to 1 or is it better to use to original values as a sets and compare two sets together? (a set has maximum of 10 items)
i use both psql v8.2 and v9.x