I have a table in BigQuery. I have a certain string column which represents a unique id (uid). I want to filter only a sample of this table, by taking only a portion of the uids (let's say 1/100). So my idea is to sample the data by doing something like this:
if(ABS(HASH(uid)) % 100 == 0) ...
The problem is this will actually filter in 1/100 ratio only if the distribution of the hash values is uniform. So, in order to check that, I would like to generate the following table:
(n goes from 0 to 99)
0 <number of rows in which uid % 100 == 0>
1 <number of rows in which uid % 100 == 1>
2 <number of rows in which uid % 100 == 2>
3 <number of rows in which uid % 100 == 3>
.. etc.
If I see the numbers in each row are of the same magnitude, then my assumption is correct.
Any idea how to create such a query, or alternatively do the sampling another way?