SHouldn't the following query remove duplicates:
SELECT DISTINCT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount AS ACC RIGHT OUTER JOIN
(SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate) AS Relevant ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType, ACC.TenancyType
From my understanding (and what I want to happen) is, the query in brackets is selecting the property ID and of the ones with a status of current returning the highest tenancy start date (albeit several times). This is then joined to the original table by start date and property id, to get the most recent tenancytype.
Why is it still returning duplicate lines!?
(by the way this is relating to another question I had yesterday, but apparently replies are not supposed to descend into conversation so I thought I'd seperate this off... I hope that is the right thing to do... I have searched but clearly there is something missing in my understanding of something!)
First, you almost never need select distinct
when using group by
.
The problem with your query is the group by
clause in the subquery.
SELECT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount ACC RIGHT OUTER JOIN
(SELECT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID
) Relevant
ON ACC.PropertyID = Relevant.PropertyID AND
ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType;
It should not have TenancyStartDate
. Also, your outer query had ACC.TenancyType
twice in the group by
.
That said, it is easier to write the query using analytic functions:
select a.*
from (select a.*,
max(tenancystartdate) over (partition by propertyid) as max_tsd
from dimaccount a
where accountstatus = 'Current'
) a
where tenancystartdate = max_tsd;
This is not exactly the same as your query, because your query will take non-current records into account. I am guessing that this might be the intention, however.
To answer your question: Yes, you are right, there can be no duplicates. And I am pretty sure there are none. I am also pretty sure that your query does not what you think it does.
This is your derived table:
SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate
As you group by PropertyID and TenancyStartDate, you get one line per PropertyID and TenancyStartDate. For each such line you want the MAX(TenancyStartDate), which is the TenancyStartDate itself of course. There is no other field you aggregate, so you don't aggregate at all, but only make the rows distinct, for which one would use DISTINCT. Then you do use DISTINCT to get unique result records, but your records are already unique, by your obfuscated way of doing it. So you say: select the distinct records of distinct records. Your subquery can be re-written as:
SELECT DISTINCT PropertyID, TenancyStartDate
FROM DimAccount
WHERE AccountStatus = 'Current'
Then you outer-join the DimAccount table. So you would keep your found records, even in case there is no matching DimAccount record. But: You've selected from DimAccount, so of course there is always at least the one record you already found. Your outer join is actually an inner join. Then the only field from the derived query shown is PropertyID which always equals ACC.PropertyID. This means: You are only selecting records from ACC and the derived table is just to make sure a 'Current' record exists for PropertyID and TenancyStartDate. Your query could thus be re-written as:
SELECT DISTINCT
PropertyID, TenancyStartDate, AccountID, TenancyType
FROM DimAccount AS ACC
WHERE EXISTS
(
SELECT *
FROM DimAccount CurrentAccount
WHERE CurrentAccount.AccountStatus = 'Current'
AND CurrentAccount.PropertyID = ACC.PropertyID
AND CurrentAccount.TenancyStartDate = ACC.TenancyStartDate
);
In case PropertyID + TenancyStartDate + AccountID + TenancyType are unique (is AccountID the table's ID?) then you can even remove DISTINCT.
This query gets all 'Current' DimAccount records first and then gives you all records with the same PropertyID and TenancyStartDate. However, from your explanation it seems you want to select the latest 'Current' DimAccount record per PropertyID. This is something entirely else. There are different solutions to such a task depending on the dbms you are using (you haven't specified yours in your tags).