I need to pull several rows from a table and process them in two ways:
- aggregated on a key
- row-by-row, sorted by the same key
The table looks roughly like this:
table (
key,
string_data,
numeric_data
)
So I'm looking at two approaches to the function I'm writing.
The first would pull the aggregate data with one query, and then query again inside a loop for each set of row-by-row data (the following is PHP-like pseudocode):
$rows = query(
"SELECT key,SUM(numeric_data)
FROM table
GROUP BY key"
);
foreach ($rows as $row) {
<process aggregate data in $row>
$key = $row['key'];
$row_by_row_data = handle_individual_rows($key);
}
function handle_individual_rows($key)
{
$rows = query(
"SELECT string_data
FROM table WHERE key=?",
$key
);
<process $rows one row at a time>
return $processed_data;
}
Or, I could do one big query and let the code do all the work:
$rows = query(
"SELECT key, string_data, numeric_data
FROM table"
);
foreach ($rows as $row) {
<process rows individually and calculate aggregates as I go>
}
Performance is not a practical concern in this application; I'm just looking to write sensible and maintainable code.
I like the first option because it's more modular -- and I like the second option because it seems structurally simple. Is one option better than the other or is it really just a matter of style?
The second answer is by far more clear, sensible and maintainable. You're saying the same thing with less code, which is usually better.
And I know you said performance is not a concern, but why fetch data more than you have to?
I can't be certain from the example here, but I'd like to know if there's a chance to do the aggregation and other processing right in the SQL query itself. In this case, you'd have to evaluate "more maintainable" with respect to your relative comfort level expressing that processing in SQL code vs. PHP code.
Is there something about the additional processing you need to do on each row that would prevent you from expressing everything in the SQL query itself?
I don't think you'll find many situations at all where doing a query-per-iteration of a loop is the better choice. In fact, I'd say it's probably a good rule of thumb to never do that.
In other words, the fewer round trips to the database, the better.
Depending on your data and actual tables, you might be able to let SQL do the aggregation work and select all the rows you need with one query.
I think somehow you've answered your own question, because you say you have two different processings : one aggregation and one row by row.
if you want to keep everything readable and maintainable, mixing both in a single query doesn't sound right, the query will answer two different needs so it won't be very readable
even if perf is not an issue, it's faster to do the aggregation on the DB server instead of doing it in code
with only one query, the code that will handle the result will mix two processings, handling rows and computing aggregations in the same time, so in time this code will tend to get confusing and buggy
the same code might evolve over time, for instance the row-by-row can get complex and could create bugs in the aggregation part or the other way around
if in the future you'll need to split these two treatments, it will be harder to disentangle the code that at that moment, somebody else has written ages ago...
Performance considerations aside, in terms of maintainability and readability I'd recommend to use two queries.
But keep in mind that the performance factor might not be an issue at the moment, but it can be in time once the db volume grows or whatever, it's never a negligible factor on long term ...
One
SQL
query, for sure.This will
GROUP BY
methodsSince your aggregates may be performed equally well by the database, it will also be better for mainainability: you have all your resultset logic in one place.
Here is an example of a query that returns every row and calculates a
SUM
:Note that this will most probably use parallel access to calculate
SUM
's for differentkey
's, which is hardly implementable inPHP
.Same query in
MySQL
:one sql query is probably a better idea. It avoids you having to re-write relational operations