-->

Sybase SQL - Remove “semi-duplicates” from query r

2019-08-26 21:36发布

问题:

I have a query that uses two SELECT statements that are combined using a UNION ALL. Both statements pull data from similar tables to populate the query results. I am attempting to remove the "semi-duplicate" rows from the query, but am having issues doing so.

My query is the following:

    SELECT DISTINCT * 
FROM
    (
    SELECT
        TeamNum = CASE 
                WHEN T.TeamName = 'Alpha Team'
                    THEN '1'
                WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
                    THEN '2'
                WHEN T.TeamName = 'Delta Team'
                    THEN '3'
                ELSE '<Undefined>'
                END,
        P.PatientLastName AS LastName,
        P.PatientFirstName AS FirstName,
        R.PrimaryCity AS City,
        ReimbursorName = CASE
                WHEN RE.ReimbursorDescription = 'Medicare'
                    Then 'R1'
                WHEN RE.ReimbursorDescription = 'Medicaid'
                    Then 'R2'
                ELSE 'R3'
                END,
        P.PatientID AS PatientID
    FROM 
        PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
        Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
        PatReferrals PR LEFT OUTER JOIN PatReimbursors PRE ON PR.PatientID = PRE.PatientID,
        PatReimbursors PRE LEFT OUTER JOIN Reimbursors RE ON PRE.ReimbursorID = RE.ReimbursorID,
        PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID,
    WHERE 
        PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
        AND PR.Status <> 'R' 
        AND PRE.CoveragePriority = '1'
        AND PRE.ExpirationDate IS NULL 

    UNION ALL

    SELECT
        TeamNum = CASE 
                WHEN T.TeamName = 'Alpha Team'
                    THEN '1'
                WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
                    THEN '2'
                WHEN T.TeamName = 'Delta Team'
                    THEN '3'
                ELSE '<Undefined>'
                END,
        P.PatientLastName AS LastName,
        P.PatientFirstName AS FirstName,
        R.PrimaryCity AS City,
        ReimbursorName = CASE
                WHEN RE.ReimbursorDescription = 'Medicare'
                    Then 'E1'
                WHEN RE.ReimbursorDescription = 'Medicaid'
                    Then 'E2'
                ELSE 'E3'
                END,
        P.PatientID AS PatientID
    FROM 
        PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
        Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
        PatReferrals PR LEFT OUTER JOIN PatEligibilities PE ON PR.PatientID = PE.PatientID,
        PatEligibilities PE LEFT OUTER JOIN Reimbursors RE ON PE.ReimbursorID = RE.ReimbursorID,
        PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID,
    WHERE 
        PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
        AND PR.Status <> 'R' 
        AND PE.Status <> 'V'
        AND PE.ApplicationDate BETWEEN DATE(PR.ReferralDate)-5 AND DATE('2100/01/01')
    ) 

AS DUMMYTBL

ORDER BY 
    DUMMYTBL.LastName ASC,
    DUMMYTBL.FirstName ASC

The results that I receive when I run the query is the following:

3   Doe Jane    Town    R1  19874
1   Roe John    City    R3  50016
1   Roe John    City    E1  50016
2   Smith   Jane    Town    E3  33975

The data that I am needing to remove is duplicate rows based on a certain criteria once the results are brought in from the original query. Each person can only be listed once and they must have a single pay source (R1, R2, R3, E1, E2, E3). If there is a R#, than there cannot be a E# listed for that person. If there are no R#'s than an E# must be listed. As shown in my example results, line 2 and 3 have the same person listed, but two pay sources (R3 and E1).

How can I go about making each person have only one row shown using the criteria that I have listed?

EDIT: Modifed the SQL query to show the original variables from the WHERE clauses in order to show further detail on the query. The PatReimbursors and the PatEligibilities tables have similar data, but the criteria is different in order to pull the correct data.

回答1:

Your query does not make sense. I would start by eliminating the implicit cartesian product, generated by the , in the from clause.

My guess is that the from clause should be:

FROM 
    PatReferrals PR LEFT JOIN
    Patient P
    ON PR.PatientID = P.PatientID left outer join
    Rolodex R
    ON P.RolodexID = R.RolodexID left outer join
    PatEligibilities PE
    ON PR.PatientID = PE.PatientID left outer join
    Reimbursors RE
    ON PE.ReimbursorID = RE.ReimbursorID left outer join
    Teams T ON PR.TeamID = T.TeamID

Once you do this, you may not need the union all or the select distinct. You may be able to put both the reimbursors and the eligibilities in the same query.



回答2:

Use a subquery or subqueries.

The overall query should be written using the following pattern:

 Select Distinct [Person Data]
 From PersonTable
     left Join to otherTable1 -- add outer join for each table you need data from
         On [Conditions that ensure join can generate only one row per person,
               ... and specify which of possibly many rows to get...]

Make sure the conditions eliminate any possibility for the join to generate more than one row from the other [outer] table per person row in in the person table,. This may (and often does) require that the join condition be based on a subquery, as, for example...

 Select Distinct [Person Data]
 From PersonTable p
     left Join to employments e -- add outer join for each table you need data from
         On e.PersonId = p.PersonId 
            and e.HireDate = (Select Max(hiredate) from employments
                              where personId = p.PersonId)


回答3:

After working with this for quite some time today, I found a solution to the problem that I was having. Here is the solution that works and pulls the correct information that I was needing:

SELECT DISTINCT 
    TeamNum,
    LastName,
    FirstName,
    City,
    ReimbursorName = CASE
        WHEN max(ReimbursorName) IN ('R1', 'E1')
            THEN '1' 
        WHEN max(ReimbursorName) IN ('R2', 'E2')
            THEN '2'
        ELSE '3'
        END,
    PatientID 
FROM
    (
    SELECT
        TeamNum = CASE 
                WHEN T.TeamName = 'Alpha Team'
                    THEN '1'
                WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
                    THEN '2'
                WHEN T.TeamName = 'Delta Team'
                    THEN '3'
                ELSE '<Undefined>'
                END,
        P.PatientLastName AS LastName,
        P.PatientFirstName AS FirstName,
        R.PrimaryCity AS City,
        ReimbursorName = CASE
                WHEN RE.ReimbursorDescription = 'Medicare'
                    Then 'R1'
                WHEN RE.ReimbursorDescription = 'Medicaid'
                    Then 'R2'
                ELSE 'R3'
                END,
        P.PatientID AS PatientID
    FROM 
        PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
        Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
        PatReferrals PR LEFT OUTER JOIN PatReimbursors PRE ON PR.PatientID = PRE.PatientID,
        PatReimbursors PRE LEFT OUTER JOIN Reimbursors RE ON PRE.ReimbursorID = RE.ReimbursorID,
        PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID
    WHERE 
        PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
        AND PR.Status <> 'R' 
        AND PRE.CoveragePriority = '1'
        AND PRE.ExpirationDate IS NULL 

    UNION ALL

    SELECT
        TeamNum = CASE 
                WHEN T.TeamName = 'Alpha Team'
                    THEN '1'
                WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
                    THEN '2'
                WHEN T.TeamName = 'Delta Team'
                    THEN '3'
                ELSE '<Undefined>'
                END,
        P.PatientLastName AS LastName,
        P.PatientFirstName AS FirstName,
        R.PrimaryCity AS City,
        ReimbursorName = CASE
                WHEN RE.ReimbursorDescription = 'Medicare'
                    Then 'E1'
                WHEN RE.ReimbursorDescription = 'Medicaid'
                    Then 'E2'
                ELSE 'E3'
                END,
        P.PatientID AS PatientID
    FROM 
        PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
        Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
        PatReferrals PR LEFT OUTER JOIN PatEligibilities PE ON PR.PatientID = PE.PatientID,
        PatEligibilities PE LEFT OUTER JOIN Reimbursors RE ON PE.ReimbursorID = RE.ReimbursorID,
        PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID
    WHERE 
        PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
        AND PR.Status <> 'R' 
        AND PE.Status <> 'V'
        AND PE.ApplicationDate BETWEEN DATE(PR.ReferralDate)-5 AND DATE('2100/01/01')
    ) 

AS DUMMYTBL
GROUP BY
    TeamNum,
    LastName,
    FirstName,
    City,
    PatientID
ORDER BY 
    DUMMYTBL.LastName ASC,
    DUMMYTBL.FirstName ASC

Thanks for all the responses that were provided.