Here's my question: how do I maintain record integrity using aggregate functions with a group by?
To explain further, here's an example.
I have a table with the following columns: (Think of it as an "order" table)
Customer_Summary (first 10 char of name + first 10 char of address)
Customer_Name
Customer_Address
Customer_Postal Code
Order_weekday
There is one row per "order", so many rows with the same customer name, address, and summary.
What I want to do is show the customer's name, address, and postal code, as well as the number of orders they've placed on each weekday, grouped by the customer's summary.
So the data should look like:
Summary | Name | Address | PCode | Monday | Tuesday | Wednesday | Thursday | Friday
test custntest addre|test custname|test address|123456 | 1 | 1 | 1 | 1 | 1
I only want to group records of similar customer summary together, but obviously I want one name, address, and postal code to show. I'm using min() at the moment, so my query looks like:
SELECT Customer_Summary, min(customer_name), min(customer_address), min(customer_postal_code)
FROM Order
Group by customer_summary
I've omitted my weekday logic as I didn't think it was necessary.
My issue is this - some of these customers with the same customer summary have different addresses and postal codes.
So I might have two customers, looking like:
test custntest addre|test custname |test address |323456|
test custntest addre|test custname2|test address2|123456|
Using the group by, my query will return the following:
test custntest addre|test custname |test address |123456|
Since I'm using min, it's going to give me the minimum value for all of the fields, but not necessarily from the same record. So I've lost my record integrity here - the address and name returned by the query do not correctly match the postal code.
So how do I maintain data integrity on non-grouped fields when using a group by clause?
Hopefully I explained it clearly enough, and thanks in advance for the help.
EDIT: Solved. Thanks everyone!
I think you need to re-think your structure.
Ideally you would have a
Customer
table with an unique ID. Then you would use that unique ID in theOrder
table. Then you don't need the strange "first 10 characters" method that you are using. Instead, you just group by the unique ID from theCustomer
table.You could even then also have a separate table for addresses, relating each address to the customer, with multiple rows (with fields marking them as home address, delivery address, billing address, etc).
This way you separate the Customer information from the Address information and from the Order information. Such that if the customer changes name (marriage) or address (moving home) you don't break your data - Everything is related by the IDs, not the data itself.
[This relationship is known as a Foreign Key.]
You can always use
ROW_NUMBER
instead ofGROUP BY
Then you are free to order which customer to use in the ORDER BY clause. Currently I am order them by name and then address.
Edit:
My solution does what you asked for. But I surely agree with the others: If you are allowed to change the database structure, this would be a good idea... which you are not (saw your comment). Well, then ROW_NUMBER() is a good way.