Merge multiple rows with same ID into one row

2019-02-06 13:00发布

问题:

How can I merge multiple rows with same ID into one row.

When value in first and second row in the same column is the same or when there is value in first row and NULL in second row. I don't want to merge when value in first and second row in the same column is different.

I have table:

ID |A    |B    |C
1   NULL  31    NULL
1   412   NULL  1
2   567   38    4
2   567   NULL  NULL
3   2     NULL  NULL
3   5     NULL  NULL
4   6     1     NULL
4   8     NULL  5
4   NULL  NULL  5

I want to get table:

ID |A    |B    |C
1   412   31    1
2   567   38    4
3   2     NULL  NULL
3   5     NULL  NULL
4   6     1     NULL
4   8     NULL  5
4   NULL  NULL  5

回答1:

I think there's a simpler solution to the above answers (which is also correct). It basically gets the merged values that can be merged within a CTE, then merges that with the data not able to be merged.

WITH CTE AS (
    SELECT
        ID,
        MAX(A) AS A,
        MAX(B) AS B,
        MAX(C) AS C
    FROM dbo.Records
    GROUP BY ID
    HAVING MAX(A) = MIN(A)
        AND MAX(B) = MIN(B)
        AND MAX(C) = MIN(C)
)
    SELECT *
    FROM CTE
    UNION ALL
    SELECT *
    FROM dbo.Records
    WHERE ID NOT IN (SELECT ID FROM CTE)

SQL Fiddle: http://www.sqlfiddle.com/#!6/29407/1/0



回答2:

WITH Collapsed AS (
   SELECT
      ID,
      A = Min(A),
      B = Min(B),
      C = Min(C)
   FROM
      dbo.MyTable
   GROUP BY
      ID
   HAVING
      EXISTS (
         SELECT Min(A), Min(B), Min(C)
         INTERSECT
         SELECT Max(A), Max(B), Max(C)
      )
)
SELECT
   *
FROM
   Collapsed
UNION ALL
SELECT
   *
FROM
   dbo.MyTable T
WHERE
   NOT EXISTS (
      SELECT *
      FROM Collapsed C
      WHERE T.ID = C.ID
);

See this working in a SQL Fiddle

This works by creating all the mergeable rows through the use of Min and Max--which should be the same for each column within an ID and which usefully exclude NULLs--then appending to this list all the rows from the table that couldn't be merged. The special trick with EXISTS ... INTERSECT allows for the case when a column has all NULL values for an ID (and thus the Min and Max are NULL and can't equal each other). That is, it functions like Min(A) = Max(A) AND Min(B) = Max(B) AND Min(C) = Max(C) but allows for NULLs to compare as equal.

Here's a slightly different (earlier) solution I gave that may offer different performance characteristics, and being more complicated, I like less, but being a single flowing query (without a UNION) I kind of like more, too.

WITH Collapsible AS (
   SELECT
      ID
   FROM
      dbo.MyTable
   GROUP BY
      ID
   HAVING
      EXISTS (
         SELECT Min(A), Min(B), Min(C)
         INTERSECT
         SELECT Max(A), Max(B), Max(C)
      )
), Calc AS (
   SELECT
      T.*,
      Grp = Coalesce(C.ID, Row_Number() OVER (PARTITION BY T.ID ORDER BY (SELECT 1)))
   FROM
      dbo.MyTable T
      LEFT JOIN Collapsible C
         ON T.ID = C.ID
)
SELECT
   ID,
   A = Min(A),
   B = Min(B),
   C = Min(C)
FROM
   Calc
GROUP BY
   ID,
   Grp
;

This is also in the above SQL Fiddle.

This uses similar logic as the first query to calculate whether a group should be merged, then uses this to create a grouping key that is either the same for all rows within an ID or is different for all rows within an ID. With a final Min (Max would have worked just as well) the rows that should be merged are merged because they share a grouping key, and the rows that shouldn't be merged are not because they have distinct grouping keys over the ID.

Depending on your data set, indexes, table size, and other performance factors, either of these queries may perform better, though the second query has some work to do to catch up, with two sorts instead of one.



回答3:

You can try something like this:

select 
isnull(t1.A, t2.A) as A,
isnull(t1.B, t2.B) as B,
isnull(t1.C, t2.C) as C
from
table_name t1
join table_name t2 on t1.ID = t2.ID and ..... 

You mention the concepts of first and second. How do
you define this order? Place that order defining condition
in here: .....

Also, I assume you have exactly 2 rows for each ID value.