I'm working on a problem in Oracle that I'm struggling to solve 'elegantly'.
I have a data extract with three different identifiers: A, B, C
Each identifier may appear in more than one row, and each row may have one or more of these three identifiers (i.e the column is populated or null).
I want to be able to group all records that have any combination of either A, B or C in common and assign them the same group id.
Extract table showing what the eventual groups should be:
Rownum | A | B | C | End group
1 p NULL NULL 1
2 p r NULL 1
3 q NULL NULL 2
4 NULL r NULL 1
5 NULL NULL s 2
6 q NULL s 2
My original approach was to assign a guid to each row in the extract and create a lookup table for the three identifiers:
GUID | IDENTIFIER | IDENTIFIER TYPE | GROUP | END GROUP
1 p A 1 1
2 p A 1 1
2 r B 2 1
3 q A 3 3
4 r B 2 1
5 s C 4 3
6 q A 3 3
6 s C 4 3
Then group by identifier and assign a group number. The groups, however, need to be combined where possible to provide the view shown in end group.
The only solution I can think of for this problem is to use loops, which I'd rather avoid.
Any ideas would be greatly appreciated.
Niall
This is truly an interesting problem. Still, I think we are missing a definition of a "group". Since in your example
(p,null,null)
(row1) and(null,r,null)
(row4) share no common identifier and belong to the same group I'll go with this definition for grouping:This means we can "chain" rows. This naturally leads to a hierarchical solution:
You can execute the subquery to understand the construction:
Use merge instead of loop:
Statement:
It is same as: