SQL random number that doesn't repeat within a

2019-08-21 13:10发布

问题:

Suppose I have a table:

HH  SLOT  RN
--------------
 1     1  null
 1     2  null
 1     3  null
--------------
 2     1  null
 2     2  null
 2     3  null

I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:

HH  SLOT  RN_GOOD  RN_BAD
--------------------------
 1     1        9       3
 1     2        4       8
 1     3        7       3  <--!!!
--------------------------
 2     1        2       1
 2     2        4       6
 2     3        9       4

This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!

回答1:

To get a random number between 1 and the number of rows in the hh, you can use:

select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;

The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:

with nums as (
      select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
      select 6 union all select 7 union all select 8 union all select 9
     ),
     randoms as (
      select n, row_number() over (order by random()) as pos
      from nums
     )
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
      from (select distinct hh
            from t
           ) t cross join
           randoms
     ) hnum join
     t
     on t.hh = hnum.hh and
        t.slot = hnum.pos;

Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.



回答2:

I am not an expert on SQL, but probably do something like this:

  1. Initialize a counter CNT=1
  2. Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
  3. With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
  4. Increment CNT and go to step 2


回答3:

Well, I couldn't get a slick solution, so I did a hack:

  1. Created a new integer field called rand_inst.
  2. Assign a random number to each empty slot.
  3. Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
  4. Update the table to assign a different random number anywhere that rand_inst>1.
  5. Repeat assignment and update until we converge on a solution.

Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:

/* Iterative hack to fill 6 slots with a random number between 1 and 13.
   A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
    select household_id
          ,slot_nbr
          ,row_number() over (partition by household_id,rnd order by null) as rnum
    from c3_lalfinal
) b
where a.household_id = b.household_id
  and a.slot_nbr = b.slot_nbr
;

update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;

/* Repeat until this query returns 0: */
select count(*) from (
  select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;