-->

k* reproduction values?

2019-07-26 04:00发布

问题:

I am reading about Product Quantization, from section II.A page 3 of PQ for NNS, that says:

..all subquantizers have the same finite number k* of reproduction values. In that case the number of centroids is (k*)^m

where m is the number of subvectors.

However, I do not get k* at all! I mean in vector quantization we assign every vector to k centroids. In produce quantization, we assign every subvector to k centroids. How did k* come into play?

回答1:

I think k* is the number of centroids in each subspace, and k is the number of centroids in the whole space.

For example if the data is 2d, like (x, y), and we treat each dimension as a subspace, and do kmeans with say k*=3 respectively, we'll get 3 centroids in each subspace, {x1, x2, x3} and {y1, y2, y3}.

Then there'll be 3^2=9 possible centroids in the whole space, which are* (x1, y1), (x1, y2), (x1, y3), (x2, y1)...

In this way we can get a large number of centroids (2^64 in the paper) using a small amount of memory, because we don't have to store all k*^m centorids, we only need to store k* centroids in each subspace.

Edit:
In above the example, the number of subspaces m=2, number of centroids in each subspace k*=3, number of centroids the whole subspace k=3^2, number of dimensions of each subspace D*=1, number of floating points to store mD*k*=Dk*=6.


*the cartesian product of x and y