comparison query taking ages

2019-08-12 15:20发布

问题:

My query is quite simple:

select  a.ID, a.adres, a.place, a.postalcode  
from COMPANIES a, COMPANIES b  
where a.Postcode = b.Postcode  
and a.Adres = b.Adres  
and (  
select COUNT(COMPANYID)  
from USERS  
where COMPANYID=a.ID  
)>(  
select COUNT(COMPANYID)  
from USERS  
where COMPANYID=b.ID  
)

Database: sql server 2008 r2

What I'm trying to do: The table of COMPANIES contains double entries. I want to know the ones that are connected to the most amount of users. So I only have to change the foreign keys of those with the least. ( I already know the id's of the doubles)

Right now it's taking a lot of time to complete. I was wondering if if could be done faster

回答1:

Try this version. It should be only a little faster. The COUNT is quite slow. I've added a.ID <> b.ID to avoid few cases earlier.

select  a.ID, a.adres, a.place, a.postalcode  
from COMPANIES a INNER JOIN COMPANIES b
ON
a.ID <> b.ID
and a.Postcode = b.Postcode  
and a.Adres = b.Adres  
and (  
select COUNT(COMPANYID)  
from USERS  
where COMPANYID=a.ID  
)>(  
select COUNT(COMPANYID)  
from USERS  
where COMPANYID=b.ID  
)

The FROM ... INNER JOIN ... ON ... is a preferred SQL construct to join tables. It may be faster too.



回答2:

One approach would be to pre-calculate the COMPANYID count before doing the join since you'll be repeatedly calculating it in the main query. i.e. something like:

insert into @CompanyCount (ID, IDCount)
select COMPANYID, COUNT(COMPANYID)
from USERS
group by COMPANYID

Then your main query:

select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a
  inner join @CompanyCount aCount on aCount.ID = a.ID
  inner join COMPANIES b on b.Postcode = a.Postcode and b.Adres = a.Adres
  inner join @CompanyCount bCount on bCount.ID = b.ID and aCount.IDCount > bCount.IDCount

If you want all instances of a even though there is no corresponding b then you'd need to have left outer joins to b and bCount.

However you need to look at the query plan - which indexes are you using - you probably want to have them on the IDs and the Postcode and Adres fields as a minimum since you're joining on them.



回答3:

  1. Build an index on postcode and adres

  2. The database probably executes the subselects for every row. (Just guessing here, veryfy it in the explain plan. If this is the case you can rewrite the query to join with the inline views (note this is how it would look in oracle hop it works in sql server as well):

    select distinct a.ID, a.adres, a.place, a.postalcode  
    from 
        COMPANIES a, 
        COMPANIES b,  
    (
        select COUNT(COMPANYID) cnt, companyid  
        from USERS
        group by companyid) cntA,  
    (
        select COUNT(COMPANYID) cnt, companyid  
        from USERS
        group by companyid) cntb   
    where a.Postcode = b.Postcode  
    and a.Adres = b.Adres  
    and a.ID<>b.ID
    and cnta.cnt>cntb.cnt