Update duplicate rows only with a MAX function in

2019-01-29 09:28发布

问题:

I have a table like this, where, suppose for the sake of an example, NAME is a unique identifier.

NAME    AGE         VALUE
Jack    Under 65    3
Jack    66-74       5 
John    66-74       7
John    Over 75     9
Gill    25-35       11

Some NAMEs have more than one AGE, which is undesirable, as this is due to dirtiness of the data.

My aim is to update the duplicates only to have one AGE within each NAME. The desired output is thus:

NAME    AGE         VALUE
Jack    Under 65    3
Jack    Under 65    5 
John    66-74       7
John    66-74       9
Gill    25-35       11

Something like this UPDATE statement should work, but it doesn't.

UPDATE table t1
SET t1.age=MAX(t1.age)
WHERE EXISTS (SELECT COUNT(t2.AGE)
              FROM table t2
              WHERE t1.NAME=t2.NAME
              GROUP BY t2.NAME
              HAVING COUNT(t2.AGE) > 1)

SQL Error: ORA-00934: group function is not allowed here

Second issue

Even if I got the above statement to work, there is a second issue. The idea there is to use the MAX (or MIN) function on strings to set the same value for all repeats within a group.

But unfortunately, this too would not quite work as desired. For consistency, ideally an age would default to the lowest age group. But because MAX/MIN compare alphabetic order on strings, this would give, e.g.:

  • "66-74" and "Under 65" => MAX="Under 65" -- Lowest
  • "66-74" and "Over 75" => MAX="Over 75" -- Highest

There are only four age groups, would it be possible to specify a custom order?

  • NB1: I am using Oracle SQL.
  • NB2: I do not mind if there is a way to achieve the result using a SELECT instead of an UPDATE statement.

Reproducible example

SELECT 'Jack' as NAME, 'Under 65' as AGE, 3 as VALUE from dual
UNION ALL
SELECT 'Jack' as NAME, '66-74' as AGE, 5 as VALUE from dual
UNION ALL
SELECT 'John' as NAME, '66-74' as AGE, 7 as VALUE from dual
UNION ALL
SELECT 'John' as NAME, 'Over 75' as AGE, 9 as VALUE from dual
UNION ALL
SELECT 'Gill' as NAME, '25-35' as AGE, 11 as VALUE from dual

回答1:

You can define custom order with case when clause and then use analytic max(). This worked for given examples:

update t1 set age = (
    select max(age) keep (dense_rank last 
           order by case when age = 'Over 75'  then 1
                         when age = '66-74'    then 2
                         when age = 'Under 65' then 3
                         when age = '25-35'    then 4
                    end)
    from t1 tx where tx.name = t1.name )