MYSQL: Avoiding cartesian product of repeating rec

2019-05-07 14:35发布

问题:

There are two tables: table A and table B. They have the same columns and the data is practically identical. They both have auto-incremented IDs, the only difference between the two is that they have different IDs for the same records.

Among the columns, there is an IDENTIFIER column which is not unique, i.e. there are (very few) records with the same IDENTIFIER in both tables.

Now, in order to find a correspondence between the IDs of table A and the IDs of table B, I have to join these two tables (for all purposes it's a self-join) on the IDENTIFIER column, something like:

SELECT A.ID, B.ID
FROM A INNER JOIN B ON A.IDENTIFIER = B.IDENTIFIER

But, being IDENTIFIER non-unique, this generates every possible combination of the repeating values of IDENTIFIER, I don't want that.

Ideally, I would like to generate an one to one association between IDs that have repeating IDENTIFIER values, based on their order. For example, supposing that there are six records with different ID and the same IDENTIFIER value in table A (and thus in table B):

A                                 B
IDENTIFIER:'ident105', ID:10  ->  IDENTIFIER:'ident105', ID:3
IDENTIFIER:'ident105', ID:20  ->  IDENTIFIER:'ident105', ID:400
IDENTIFIER:'ident105', ID:23  ->  IDENTIFIER:'ident105', ID:420
IDENTIFIER:'ident105', ID:100 ->  IDENTIFIER:'ident105', ID:512
IDENTIFIER:'ident105', ID:120 ->  IDENTIFIER:'ident105', ID:513
IDENTIFIER:'ident105', ID:300 ->  IDENTIFIER:'ident105', ID:798

That would be ideal. Anyway, a way to generate a one to one association regardless of the order of the IDs would still be ok (but not preferred).

Thanks for your time,

Silvio

回答1:

select a_numbered.id, a_numbered.identifier, b_numbered.id from 
(
select a.*,
       case 
          when @identifier = a.identifier then @rownum := @rownum + 1
          else @rownum := 1
       end as rn,
       @identifier := a.identifier
  from a
  join (select @rownum := 0, @identifier := null) r
order by a.identifier

) a_numbered join (
select b.*,
       case 
          when @identifier = b.identifier then @rownum := @rownum + 1
          else @rownum := 1
       end as rn,
       @identifier := b.identifier
  from b
  join (select @rownum := 0, @identifier := null) r
order by b.identifier

) b_numbered 
on a_numbered.rn=b_numbered.rn and a_numbered.identifier=b_numbered.identifier