My query is quite simple:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a, COMPANIES b
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
Database: sql server 2008 r2
What I'm trying to do:
The table of COMPANIES contains double entries. I want to know the ones that are connected to the most amount of users. So I only have to change the foreign keys of those with the least. ( I already know the id's of the doubles)
Right now it's taking a lot of time to complete. I was wondering if if could be done faster
Try this version. It should be only a little faster. The COUNT
is quite slow. I've added a.ID <> b.ID
to avoid few cases earlier.
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a INNER JOIN COMPANIES b
ON
a.ID <> b.ID
and a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
The FROM ... INNER JOIN ... ON ...
is a preferred SQL construct to join tables. It may be faster too.
One approach would be to pre-calculate the COMPANYID
count before doing the join since you'll be repeatedly calculating it in the main query. i.e. something like:
insert into @CompanyCount (ID, IDCount)
select COMPANYID, COUNT(COMPANYID)
from USERS
group by COMPANYID
Then your main query:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a
inner join @CompanyCount aCount on aCount.ID = a.ID
inner join COMPANIES b on b.Postcode = a.Postcode and b.Adres = a.Adres
inner join @CompanyCount bCount on bCount.ID = b.ID and aCount.IDCount > bCount.IDCount
If you want all instances of a
even though there is no corresponding b
then you'd need to have left outer join
s to b
and bCount
.
However you need to look at the query plan - which indexes are you using - you probably want to have them on the ID
s and the Postcode
and Adres
fields as a minimum since you're joining on them.