SQL Server: How do I maintain data integrity using

2019-08-18 17:07发布

Here's my question: how do I maintain record integrity using aggregate functions with a group by?

To explain further, here's an example.

I have a table with the following columns: (Think of it as an "order" table)

Customer_Summary (first 10 char of name + first 10 char of address)
Customer_Name
Customer_Address
Customer_Postal Code
Order_weekday

There is one row per "order", so many rows with the same customer name, address, and summary.

What I want to do is show the customer's name, address, and postal code, as well as the number of orders they've placed on each weekday, grouped by the customer's summary.

So the data should look like:

Summary             | Name        | Address    | PCode | Monday | Tuesday | Wednesday | Thursday | Friday

test custntest addre|test custname|test address|123456 | 1      | 1       | 1         | 1        | 1

I only want to group records of similar customer summary together, but obviously I want one name, address, and postal code to show. I'm using min() at the moment, so my query looks like:

SELECT Customer_Summary, min(customer_name), min(customer_address), min(customer_postal_code) 
FROM Order
Group by customer_summary

I've omitted my weekday logic as I didn't think it was necessary.

My issue is this - some of these customers with the same customer summary have different addresses and postal codes.

So I might have two customers, looking like:

test custntest addre|test custname |test address |323456|

test custntest addre|test custname2|test address2|123456|

Using the group by, my query will return the following:

test custntest addre|test custname |test address |123456|

Since I'm using min, it's going to give me the minimum value for all of the fields, but not necessarily from the same record. So I've lost my record integrity here - the address and name returned by the query do not correctly match the postal code.

So how do I maintain data integrity on non-grouped fields when using a group by clause?

Hopefully I explained it clearly enough, and thanks in advance for the help.

EDIT: Solved. Thanks everyone!

2条回答
Explosion°爆炸
2楼-- · 2019-08-18 17:45

I think you need to re-think your structure.

Ideally you would have a Customer table with an unique ID. Then you would use that unique ID in the Order table. Then you don't need the strange "first 10 characters" method that you are using. Instead, you just group by the unique ID from the Customer table.

You could even then also have a separate table for addresses, relating each address to the customer, with multiple rows (with fields marking them as home address, delivery address, billing address, etc).

This way you separate the Customer information from the Address information and from the Order information. Such that if the customer changes name (marriage) or address (moving home) you don't break your data - Everything is related by the IDs, not the data itself.

[This relationship is known as a Foreign Key.]

查看更多
仙女界的扛把子
3楼-- · 2019-08-18 17:46

You can always use ROW_NUMBER instead of GROUP BY

WITH A AS (
    SELECT Customer_Summary, customer_name, customer_address, customer_postal_code,
        ROW_NUMBER() OVER (PARTITION BY Customer_Summary ORDER BY customer_name, customer_address) AS rn
    FROM Order
)
SELECT Customer_Summary, customer_name, customer_address, customer_postal_code
FROM A
WHERE rn = 1

Then you are free to order which customer to use in the ORDER BY clause. Currently I am order them by name and then address.

Edit:

My solution does what you asked for. But I surely agree with the others: If you are allowed to change the database structure, this would be a good idea... which you are not (saw your comment). Well, then ROW_NUMBER() is a good way.

查看更多
登录 后发表回答