Table:
UserId, Value, Date.
I want to get the UserId, Value for the max(Date) for each UserId. That is, the Value for each UserId that has the latest date. Is there a way to do this simply in SQL? (Preferably Oracle)
Update: Apologies for any ambiguity: I need to get ALL the UserIds. But for each UserId, only that row where that user has the latest date.
This should be as simple as:
I don't have Oracle to test it, but the most efficient solution is to use analytic queries. It should look something like this:
I suspect that you can get rid of the outer query and put distinct on the inner, but I'm not sure. In the meantime I know this one works.
If you want to learn about analytic queries, I'd suggest reading http://www.orafaq.com/node/55 and
http://www.akadia.com/services/ora_analytic_functions.html. Here is the short summary.Under the hood analytic queries sort the whole dataset, then process it sequentially. As you process it you partition the dataset according to certain criteria, and then for each row looks at some window (defaults to the first value in the partition to the current row - that default is also the most efficient) and can compute values using a number of analytic functions (the list of which is very similar to the aggregate functions).
In this case here is what the inner query does. The whole dataset is sorted by UserId then Date DESC. Then it processes it in one pass. For each row you return the UserId and the first Date seen for that UserId (since dates are sorted DESC, that's the max date). This gives you your answer with duplicated rows. Then the outer DISTINCT squashes duplicates.
This is not a particularly spectacular example of analytic queries. For a much bigger win consider taking a table of financial receipts and calculating for each user and receipt, a running total of what they paid. Analytic queries solve that efficiently. Other solutions are less efficient. Which is why they are part of the 2003 SQL standard. (Unfortunately Postgres doesn't have them yet. Grrr...)
The answer here is Oracle only. Here's a bit more sophisticated answer in all SQL:
Who has the best overall homework result (maximum sum of homework points)?
And a more difficult example, which need some explanation, for which I don't have time atm:
Give the book (ISBN and title) that is most popular in 2008, i.e., which is borrowed most often in 2008.
Hope this helps (anyone).. :)
Regards, Guus
I see many people use subqueries or else vendor-specific features to do this, but I often do this kind of query without subqueries in the following way. It uses plain, standard SQL so it should work in any brand of RDBMS.
In other words: fetch the row from
t1
where no other row exists with the sameUserId
and a greater Date.(I put the identifier "Date" in delimiters because it's an SQL reserved word.)
In case if
t1."Date" = t2."Date"
, doubling appears. Usually tables hasauto_inc(seq)
key, e.g.id
. To avoid doubling can be used follows:Re comment from @Farhan:
Here's a more detailed explanation:
An outer join attempts to join
t1
witht2
. By default, all results oft1
are returned, and if there is a match int2
, it is also returned. If there is no match int2
for a given row oft1
, then the query still returns the row oft1
, and usesNULL
as a placeholder for all oft2
's columns. That's just how outer joins work in general.The trick in this query is to design the join's matching condition such that
t2
must match the sameuserid
, and a greaterdate
. The idea being if a row exists int2
that has a greaterdate
, then the row int1
it's compared against can't be the greatestdate
for thatuserid
. But if there is no match -- i.e. if no row exists int2
with a greaterdate
than the row int1
-- we know that the row int1
was the row with the greatestdate
for the givenuserid
.In those cases (when there's no match), the columns of
t2
will beNULL
-- even the columns specified in the join condition. So that's why we useWHERE t2.UserId IS NULL
, because we're searching for the cases where no row was found with a greaterdate
for the givenuserid
.