The objective is below the list of tables.
Tables:
Table: Job
- JobID
- CustomerID
- Value
- Year
Table: Customer
- CustomerID
- CustName
Table: Invoice
- SaleAmount
- CustomerID
The Objective
Part 1: (easy) I need to select all invoice records and sort by Customer (To place nice w/ Crystal Reports)
Select * from Invoice as A inner join Customer as B on A.CustomerID = B.CustomerID
Part 2: (hard) Now, we need to add two fields:
- JobID associated with that customer's job that has the Maximum Value (from 2008)
- Value associated with that job
Pseudo Code
Select * from
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join
(select JobID, Value from Jobs where Job:JobID has the highest value out of all of THIS customer's jobs from 2008)
General Thoughts
This is fairly easy to do If I am only dealing with one specific customer:
select max(JobId), max(Value) as MaxJobID from Jobs where Value = (select max(Value) from Jobs where CustomerID = @SpecificCustID and Year = '2008') and CustomerID = SpecificCustID and CustomerID = '2008'
This subquery determines the max Value for this customer in 2008, and then its a matter of choosing a single job (can't have dupes) out of potential multiple jobs from 2008 for that customer that have the same value.
The Difficulty
What happens when we don't have a specific customer ID to compare against? If my goal is to select ALL invoice records and sort by customer, then this subquery needs access to which customer it is currently dealing with. I suppose this can "sort of" be done through the ON clause of the JOIN, but that doesn't really seem to work because the sub-sub query has no access to that.
I'm clearly over my head. Any thoughts?
The row_number() function can give you what you need:
The ROW_NUMBER() function in this query will order by value in descending order and the PARTITION BY clause will do this separately for each different value of CustomerID. This means that the highest Value for each customer will always be 1, so we can join to that value.
Just to be complete with the non
row_number
solution for those < MSSQL 2005. Personanly, I find it easier to follow myslef...but that could be biased considering how much time I spend in MSSQL 2000 vs 2005+.The
over
function is an awesome, but often neglected function. You can use it in a subquery to pull back your valid jobs, like so:Essentially, that first inner query looks like this:
You'll see that this pulls back all of the jobs, but with that additional column which lets you know what the maximum value is for each customer. With the next subquery, we filter out any rows that have
value
andmaxVal
equal. Additionally, it finds the maxJobID
based oncustomerid
andmaxVal
, because we need to pull back one and only oneJobID
(as per the requirements).Now, you have a complete listing of
CustomerID
andJobID
that meet the conditions of having the highestJobID
that contains the maximumValue
for thatCustomerID
in a given year. All that's left is to join it toInvoice
andCustomer
, and you're good to go.How about using a CTE. Obviously, I can't test, but here is the idea. You need to replace col1, col2, ..., coln with the stuff you want to select.
If you don't have a CustomerID, this will return the top value for each customer (that will hurt on performance tho).