Should I Design a SQL Server database to rely on U

2019-07-04 06:35发布

问题:

Take the following SQL Query:

SELECT     Account, Amount AS Deposit, 0.00 AS Withdrawal, Date
FROM       Deposits
WHERE      Account = @Account
UNION
SELECT     Account, 0 AS Expr1, Amount, Date
FROM       Withdrawals
WHERE      Account = @Account
ORDER BY   Date Desc

As opposed to:

SELECT    Account, TransactionType, Amount, Date
FROM      Transactions
WHERE     Account = @Account
ORDER BY  Date Desc

In the first query has deposits and withdrawals stored in different tables and whenever they need to be seen together, such as when showing a bank statement, they're UNIONed together. The second query has all transactions, deposits and withdrawals, stored in one table and the transaction type is differentiated via the TransactionType column. The query results prodcued are not exactly the same but let's say the client is happy with either as the final output.

Is there any performance reason why setting up the database using the first schema is better or worse than the second schema?

EDIT: Just for the sake of clarity, I'd like to point out that the reason I was asking this question is to find out whether there's a performance benefit in keeping tables smaller and using a UNION when necessary over having one large table where I would use WHERE clause to find different types of data. I used the above example to better express my question. The advice on which schema is better for other reasons is welcomed, but please try to also answer the question as well.

回答1:

Transactions seems more appropriate, as that's what they are. While it's less obvious, differentiating deposits from withdrawals via tables is no more sensible than differentiating cash transactions from check transactions via tables.



回答2:

To be honest, I don't think performance should be your primary concern - and in the example you give, performance is unlikely to be measurably different.

However, imagine you want to calculate the balance of the account at any point in time. If you have a "transactions" table, you can achieve this in a single query -

select sum(amount) from transactions

If you split into two tables, you need to execute two queries; broadly speaking, I would expect this to take twice as long as querying a single table, even if that table has the same number of records as the two other tables put together.

I think you should concentrate on what represents the business domain best - if you look at an old fashioned ledger, I think you'll find both credits and debits are entered in the same column. Your business stakeholders are more likely to think in terms of "transaction" - positive or negative - than in terms of having separate concepts for deposits and withdrawals.



回答3:

I would go for the Transaction table. This makes it much easier to implement a Transaction base class in your application.

If you later realize that you need custom WITHDRAWAL and DEPOSIT columns then you can create subclass tables containing those columns.



回答4:

From the viewpoint of the accounting department, deposits and withdrawals are both transactions with unsigned amounts. The difference being the direction of the cash flow. That leans towards a single transaction table rather than separate deposit and withdrawal tables.

From a E-R/data modelling perspective, if we start with the above, then the question is: do deposits and withdrawals share the same set of attributes? If they do, then you've got a strong indication that they are in fact, the same entity. If they have differing attributes, then you might be looking at different subtypes: perhaps a parent transaction table holding the common attributes and a subtable for each different type of transaction:

create table dbo.TransactionBase
(
  id     int   not null primary key ,
  type   int   not null , -- 1:Deposit, 2:Withdrawal, 3:???, ...
  amount money not null ,
  -- ... other common attributes ,
)
create table dbo.TransactionDeposit
(
  transaction_id int not null
    primary key
    foreign key references dbo.TransactionBase(id) ,
  -- ... deposit-specific attributes
)
create table dbo.TransactionWithdrawal
(
  transaction_id int not null
    primary key
    foreign key references dbo.TransactionBase(id) ,
  -- ... withdrawal-specific attributes
)

In this model, the base transaction table has a relationship with a cardinality of one-to-zero-or-one between each of its subtype tables. Each transaction exists in the base table, and in only one of the subtype tables.

Now you get the best of both worlds. You can deal with transactions as a whole (for instance, figuring out account balance) simply with a single select.

This is more important if you can see the number of transaction types increasing.



回答5:

I would create a view that has combined data that you need. That way you can having meaningful tables as well as a single source to query for information. You do not want to store two different things in the same table.

Normally you can create an indexed view if you are concerned about performance, but unfortunately since you are using unions you cannot. If you are worried about performance you might consider creating a table that you load with the unioned data.



回答6:

I would probably put these two both together in a single table, simply because they are basically very similar "transaction" entities and you'll probably want them together and indexed together for a lot of operations. You can make views which perform a horizontal section of the data and only show one or the other. If Deposits or Withdrawals have different supporting information, that might also go in the same table (with one or the other having NULLs) or in auxiliary tables.

Also, note that you probably want to use UNION ALL to avoid (hopefully unnecessary) de-duplication which UNION performs.



回答7:

I would put them in the same table, every commercial accounting system I've ever seen has one table for all transactions. There's a good reason for that. It makes sense in terms of what the users expect, it makes sense in terms of what the new developers expect and it is best for reporting where you have to do calculations against both tables to get the correct answer. If you have fields you want to be differnt bewteen the two, then put them in separate tables, but keep the main data (especially the amount and other items you need for reporting in the main table).

I worked with a system (designed by someone else) where different types of expenses were in different tables and that choice cause a lot of bugs over the years as new developers didn't know they were supposed to find the data in multiple tables. I think that separate tables here, too, would be a maintenance problem where people who make wierd choices about how to get the data (including using UNION where UNION ALL is indicated for performance reasons) when it should be simple.



回答8:

I would personally go with the two tables and utilize the UNION ALL if you need to group the results together.

All too often performance isn't considered in advance and being that this is just one of likely many queries I would design the database to fit the queries more often to be used like the lookups just for Withdraws or Deposits. Having all that data together is simpler but you will have to weed through double data which results in longer look-up times.

So in other words, if this union query is executed 100's of times a day but look-ups and updates against the individual tables happens 1000's of times a day then keep them separated.