Does Inner Join exclude duplicates?

2019-09-21 12:47发布

my query:

  select 
  SUM(payments.paid_amt) as paid_amt, 
  SUM(payments.copay_amt) as copay_amt,
  SUM(payments.CO_INSURANCE_AMT) as co_ins_amt,
  payor_group.PAYOR_GROUP as payor_group,
  DATEPART(MM,accession.XIFIN_LOAD_DATE) as load_month,
  DATEPART(yyyy,accession.XIFIN_LOAD_DATE) as load_year,
  accession.ACCESSION_ID


  from 
  [F_PAYOR_PAYMENTS_monthly] payments

  join D_PAYOR payor
  on payor.PAYOR_ID=payments.PAYMENT_PAYOR_ID

  join D_PAYOR_GROUP payor_group
  on payor.PAYOR_GROUP_KEY=payor_group.PAYOR_GROUP_KEY

  join F_ACCESSION_daily accession
  on accession.ACCESSION_ID=payments.ACCESSION_ID

  where accession.XIFIN_LOAD_DATE between '20120501' and '20120531'

  group by  payor_group.PAYOR_GROUP, DATEPART(MM,accession.XIFIN_LOAD_DATE),
  DATEPART(yyyy,accession.XIFIN_LOAD_DATE),accession.ACCESSION_ID

for some reason, the SUMs are counting the same accession.ACCESSION_ID multiple times.

when i am doing all of these joins, my concern is that i might be SUMming the same value multiple times.

how can i ensure that i am counting the same accession.ACCESSION_ID only once?

thanks so much for your help and guidance.

2条回答
Summer. ? 凉城
2楼-- · 2019-09-21 13:34

The problem is duplicaction of data.

To find which JOIN is causing the problem I'd comment each out in turn and inspect the results. It's usually obvious which is creating the extra data.

Without seeing the data it's tricky but if, for example, the accession table has a GeographyKey in it whereas the Payments table does not and if an ACCESSION_ID has occured across two GegraphyKeys, then you'll get two rows of data when joining these tables together on ACCESSION_ID.

One solution might be doing some initial grouping into #temp tables before hitting the query in your OP another alternative is CTEs like the below.

;with accession (XIFIN_LOAD_DATE, ACCESSION_ID)
    as 
    (
    select 
        XIFIN_LOAD_DATE
        ,ACCESSION_ID
    from F_ACCESSION_daily
    group by
         XIFIN_LOAD_DATE
         ,ACCESSION_ID
    )
    , payments (PAYMENT_PAYOR_ID, ACCESSION_ID, paid_amt, copay_amt, CO_INSURANCE_AMT)
    as 
    (
    select 
        PAYMENT_PAYOR_ID
         ,ACCESSION_ID
        ,sum(paid_amt) paid_amt
        ,sum(copay_amt) copay_amt
        ,sum(CO_INSURANCE_AMT) CO_INSURANCE_AMT
    from F_PAYOR_PAYMENTS_monthly
    group by
         PAYMENT_PAYOR_ID
         ,ACCESSION_ID
    )
select 
  SUM(payments.paid_amt) as paid_amt, 
  SUM(payments.copay_amt) as copay_amt,
  SUM(payments.CO_INSURANCE_AMT) as co_ins_amt,
  payor_group.PAYOR_GROUP as payor_group,
  DATEPART(MM,accession.XIFIN_LOAD_DATE) as load_month,
  DATEPART(yyyy,accession.XIFIN_LOAD_DATE) as load_year,
  accession.ACCESSION_ID

from 
  payments

  join D_PAYOR payor
  on payor.PAYOR_ID=payments.PAYMENT_PAYOR_ID

  join D_PAYOR_GROUP payor_group
  on payor.PAYOR_GROUP_KEY=payor_group.PAYOR_GROUP_KEY

  join accession
  on accession.ACCESSION_ID=payments.ACCESSION_ID

where accession.XIFIN_LOAD_DATE between '20120501' and '20120531'

group by  
  payor_group.PAYOR_GROUP
  , DATEPART(MM,accession.XIFIN_LOAD_DATE),
  DATEPART(yyyy,accession.XIFIN_LOAD_DATE),accession.ACCESSION_ID
查看更多
\"骚年 ilove
3楼-- · 2019-09-21 13:41

It's likely that you have multiple rows with the same ACCESSION_ID on your payments table with different payor groups. Since you are also grouping by payor_group, you are ending up with multipe accession ids.

We would need to see some sample data to be sure, however.

查看更多
登录 后发表回答