So, its like our normal bank accounts where we have a lot of transactions which result in inflow or outflow of money. The account balance can always be derived by simply summing up the transaction values. What would be better in this case, storing the updated account balance in the database or re-calculating it whenever needed?
Expected transaction volume per account: <5 daily
Expected retrieval of account balance: Whenever a transaction happens and once a day on an average otherwise.
How would you suggest to take a decision on this? Thanks a lot!
This is fairly subjective. The things I'd suggest taking into account are:
In terms of the merits of the two approaches proposed, summing the transaction values on-demand is likely to be the easier/quicker to implement approach.
However, it won't scale as well as maintaining the current account balance as a field in the database and updating it as you go. And it increases your overall transaction processing time somewhat, as each transaction needs to run a query to compute the current account balance before it can proceed. In practice those may be small concerns unless you have a very large number of accounts/transactions or expect to in the very near future.
The downside of the second approach is that it's probably going to take more development time/effort to set up initially, and may require that you give some thought to how you synchronize transactions within an account to ensure that each one sees and updates the balance accurately at all times.
So it mostly comes down to what the project's needs are, where development time is best spent at the moment, and whether it's worth future-proofing the solution now as opposed to implementing the second approach later on, when performance and scalability become real, rather than theoretical, problems.
Method
The Standard Accounting method in not-so-primitive countries is this. The "best practice", if you will, in others.
This method applies to any system that has similar operations; needs; historic monthly figures vs current-month requirements, such as Inventory Control, etc.
Consideration
First, the considerations.
Never duplicate data. If the current balance can be derived (and here it is simple), do not duplicate it with a summary column. Such a column is a duplication of data. It breaks Normalisation rules. Further, it creates an Update Anomaly, which otherwise does not exist.
If you use a summary column, whenever the Transactions are updated (as in changed, not as in when a new Transaction is inserted), the summary column value becomes obsolete, so it must be updated all the time anyway. That is the consequence of the Update Anomaly. Which eliminates the value of having it.
External publication. If the balance is published, as in a monthly Bank Statement, such documents usually have legal restrictions and implications, thus that published CurrentBalance value must not change after publication.
Any change, after the publication date, in the database, of a figure that is published externally, is evidence of dishonest conduct, fraud, etc.
You wouldn't want your bank, in Apr 2015, to change your Current Balance that they published in their Bank Statement to you of Dec 2014.
That figure has to be viewed as an Audit figure, published and unchangeable.
Any corrections or adjustments that are necessary are made as new Transactions in the current month, even though it applies to some previous month. This is because that applicable-to month is closed and published, because one cannot change history after it has happened and it has been recorded. The only effective month is the current one.
For interest-bearing systems, etc, in not-so-primitive countries, when an error is found, and it has an historic effect (eg. you find out in Apr 2015 that the interest calculated on a security has been incorrect, since Dec 2014), the value of the corrected interest payment/deduction is calculated today, for the number of days that were in error, and the sum is inserted as a Transaction in the current month. Again, the only effective month is the current one.
And of course, the interest rate for the security has to be corrected as well, so that the error does not repeat.
If you find an error in your bank's calculation of the interest on your Savings (interest-bearing) Account, and you have it corrected, you get a single deposit, that constitutes the whole adjustment value, in the current month. That is a Transaction in the current month.
The bank does not: change history; apply interest for each of the historic months; recall the historic Bank Statements; re-publish the historic Bank Statements. No. Except maybe in Idi Amin type countries.
The same principles apply to Inventory control systems. It maintains sanity.
All real accounting systems (ie. those that are accredited by the Audit Authority in the applicable country, as opposed to the Mickey Mouse "packages" that abound) use a Double Entry system for Transactions, precisely because it prevents a raft of errors, the most important of which is, funds do not get "lost". That requires a simple General Ledger.
The major issues that affect performance are outside the scope of this question, they are in the area of whether you implement a genuine Relational Database or not (a Record Filing System in an SQL database container, typified by IDs).
The use of genuine Relational Keys, etc will maintain high performance, regardless of the population of the tables.
Conversely, an RFS will perform badly, they simply cannot perform. "Scale" when used in the context of an RFS, is a fraudulent term: it hides the cause and seeks to address everything but the cause.
Implementation
For each Account, there will be a ClosingBalance, in a AccountStatement table (one row per Account per month), along with StatementDate and other Statement details.
This is not a duplicate because it is demanded for Audit and sanity purposes.
For Inventory, a QuantityOnHand column, in the PartAudit table (one row per Part per month)
It has an additional value, in that it constrains the scope of the Transaction rows required to be queried for the current month
Again, if your table is Relational, the Primary Key for AccountTransaction will be (AccountCode, TransactionDateTime) which will retrieve the Transactions at millisecond speeds.
Whereas for a Record Filing System, the "primary key" will be (TransactionID), and you will be retrieving the current month by TransactionDate, which may or may not be indexed correctly, and the rows required will be spread across the file. In any case at far less than ClusteredIndex speeds, and where the retrieved rows are many, a tablescan.
The Transaction table remains simple (the real world notion of a bank account Transaction is simple). It has a single positive Amount column.
For each Account, the CurrentBalance is:
the Statement.ClosingBalance of the previous month
(for inventory, the PartAudit.QuantityOnHand)
plus the sum of the Transaction.Amounts in the current month, where the TransactionType indicates a deposit
(for inventory, the Transaction.QuantityAffected)
minus the sum of the Transaction.Amounts in the current month, where the TransactionType indicates a withdrawal
In this Method, the Transactions in the current month, only, are in a state of flux, thus they must be derived. All previous months are published and closed, thus the Audit figure must be used.
The older rows in the Transaction table can be purged. Older than ten years for public money, five years otherwise, one year for hobby club systems.
Of course, it is essential that any code relating to accounting systems uses genuine OLTP Standards and genuine SQL ACID Transactions.
This design incorporates all scope-level performance considerations (if this is not obvious, please ask for expansion). Scaling is a non-issue, now scaling issues are honestly outside database.
Corrective Advice
These items need to be stated only because incorrect advice has been provided in SO Answers (and up-voted by the masses, democratically, of course), and the internet is chock-full of incorrect advice (amateurs love to publish their subjective "truths"):
Evidently, some people do not understand that I have given a Method in technical terms. As such, it is not pseudo-code for a specific application in a specific country. The Method is for capable developers, it is not detailed enough for those who need to be lead by the hand.
They also do not understand that the cut-off period of a month is an example: if your cut-off for Tax Office purposes is quarterly, then by all means, use a quarterly cut-off; if the only legal requirement you have is annual, use annual.
Even if your cut-off is quarterly for external or compliance purposes, the company may well choose a monthly cut-off, for internal Audit and sanity purposes (ie. to keep the length of the period of the state of flux to a minimum).
Eg. in Australia, the Tax Office cut-off for businesses is quarterly, but larger companies cut-off their inventory control monthly (this saves having to chase errors over a long period).
Eg. banks have legal compliance requirements monthly, therefore they perform an internal Audit on the figures, and close the books, monthly.
In primitive countries and rogue states, banks keep their state-of-flux period at the maximum, for obvious nefarious purposes. Some of them only make their compliance reports annually. That is one reason why the banks in Australia do not fail.
In the Transaction table, do not use negative/positive in the Amount column. Money always has a positive value, there is no such thing as negative twenty dollars (or that you owe me minus fifty dollars, and then working out that the double negatives mean something else).
The movement direction, or what you are going to do with the funds, is a separate and discrete fact (to the Transaction.Amount). Which requires a separate column (two facts in one datum breaks Normalisation rules, with the consequence that it introduces complexity into the code).
Implement a TransactionType column, which is ( D, W ) for Deposit/Withdrawal as your starting point. As the system grows, simply add ( A, R, w, M ) for Adjustment, Refund, ATM_Withdrawal, Management_Fee, etc.
No code changes required.
In some primitive countries, litigation requirements state that in any report that lists Transactions, a running total must be shown on every line. (Note, this is not an Audit requirement because those are superior [Method above] to the court requirement; Auditors are somewhat less stupid than lawyers; etc.)
Obviously, I would not argue with an court requirement. The problem is that primitive coders translate that into: oh, we must implement a Transaction.CurrentBalance column. They fail to understand that:
the requirement to print a column on a report is not a dictate to store a value in the database
a running total of any kind is a derived value, and it is easily coded (post a question if it isn't easy for you). Just implement the required code in the report.
implementing the running total eg. Transaction.CurrentBalance as a column causes horrendous problems:
introduces a duplicated column, because it is derivable. Breaks Normalisation. Introduces an Update Anomaly.
the Update Anomaly: whenever a Transaction is inserted historically, or a Transaction.Amount is changed, all the Transaction.CurrentBalances from that date to the present have to be re-computed and updated.
in the above case, the report that was filed for court use, is now obsolete (every report of online data is obsolete the moment it is printed). Ie. print; review; change the Transaction; re-print; re-review, until you are happy. It is meaningless in any case.
which is why, in less-primitive countries, the courts do not accept any old printed paper, they accept only published figures, eg. Bank Statements, which are already subject to Audit requirements (refer the Method above), and which cannot be recalled or changed and re-printed.