I'm using Rails and MySQL, and have an efficiency question based on row counting.
I have a Project
model that has_many :donations
.
I want to count the number of unique donors for a project.
Is having a field in the projects
table called num_donors
, and incrementing it when a new donor is created a good idea?
Or is something like @num_donors = Donor.count(:select => 'DISTINCT user_id')
going to be similar or the same in terms of efficiency thanks to database optimization? Will this require me to create indexes for user_id
and any other fields I want to count?
Does the same answer hold for summing the total amount donated?
To answer the title question. Yes it is redundant, but whether you should do it depends on your situation.
Unless you have known performance problems, calculate the counts and totals on the fly in your application and don't store them. That is, don't store calculated values unless you have no other choice.
In most situations, you wont have to resort to this and shouldn't.
If you must store calculated values, do the following:
- Don't keep it up-to date by incrementing it. Recalculate the count/total from all the data each time you update it.
- If you don't have a lot of updates,
put the code in an update trigger to
keep the count/totals up to date.
- The trouble with redundancy in
databases is that when the numbers
disagree, you are unsure of which is
authoritative. Add to the
documentation a note that the source
data is the authoritative source if
they disagree and can be overwritten.
While it depends on the size of your database, these are the kinds of operations that databases specialize in, so they should be fast. It's probably a case of premature optimization here - you should start by not storing the totals, thus making it simpler - and optimize later if necessary.
Remember the maxim "A man with one watch always knows the time. A man with two watches is never sure." I would only store the derived number if:
Performance issues stop you from getting the derived numbers when you need them (which should not be a problem in this case since the answer is likely to be available from the indexes)
or
You have reason to believe that you are losing records from the main table through programmer error or deliberate or accidental user action. In that case, you can use your the derived number to audit the currently calculated number.
Peter's and JohnFx's answers are sound, what you're proposing is the denormalization of your database schema, which can improve read performance but at the detriment of writes while additionally putting the onus on the developer (or additional DBMS clevers) to prevent inconsistencies within your dataset.
ActiveRecord has some built in functionality to automatically manage counts on has_many
relationships. Check out this Railscast on counter caches.
Do you know that a simple flag does the ActiveRecord magic?
class ThingOwner
# it has a column like
# t.integer things_count, :default => 0
has_many :things, :counter_cache => true
end
As for the question - yeah, sure it is redundant, I would add such a counter if and only if things.count
's share of time is too large.
Otherwise it's premature optimization.