I wonder if using CASE ... WHEN ... THEN expression in MySQL queries
has negative effect on performance?
Instead of using CASE expression (for example inside your UPDATE query)
you always have possibility to make if else statement in your program
written in php, python, perl, java, ... to choose wich query to send, for example (in pseudocode):
prepareStatement(
"UPDATE t1 SET c1=c1+1, msg=CASE (@v:=?) WHEN '' THEN msg ELSE @v END"
);
setStatementParameter(1, message);
or insead:
if (message == "") {
prepareStatement("UPDATE t1 SET c1=c1+1");
} else {
prepareStatement("UPDATE t1 SET c1=c1+1, msg=?");
setStatementParameter(1, message);
}
(c1 here needed just to show that something happens in both cases)
What way of doing it has better performance?
And how much the performance penalty is?
All per-row functions will have an impact on performance, the only real question is: "Is the impact small enough to not worry about?". This is something you should discover by measuring rather than guessing. Database administration is only a set-and-forget activity if neither your data nor your queries ever change.
Otherwise, you should be periodically monitoring performance to ensure no problems occur.
By "small enough" in the above comments, I mean, you probably needn't worry about the performance impact of something like:
select * from friends where lowercase(lastname) = "smith"
if you only have three friends.
The impact of these things becomes more serious as the table increases in size. For example, if you have one hundred million customers and you want to find all the ones likely to be computer-related, you wouldn't want to try:
select name from customers where lowercase(name) like '%comp%'
That's likely to bring your DBAs down on you like a ton of bricks.
One way we've fixed this in the past is to introduce redundancy into the data. Using that first example, we would add an extra column called lowerlastname
and populate it with the lowercase value of lastname
. The index that for search purposes and your select
statements become blindingly fast, as they should be.
And what does that do to our much loved 3NF, I hear you ask? The answer is nothing, if you know what you're doing :-)
You can set up the database so that this new column is populated by an insert/update trigger, to maintain data consistency. It's perfectly acceptable to break 3NF for performance reasons, provided you understand and mitigate the consequences.
Similarly, that second query could have an insert update trigger that populated a new indexed column name_contains_comp
whenever an entry was updated or inserted that contained the relevant text.
Since most databases are read far more often than they're written, this moves the cost of the calculation to the insert/update, effective amortising it across all select operations. The query would then be:
select name from customers where name_contains_comp = 'Y'
Again, you'll find the query blindingly fast at the minor cost of slightly slower inserts and updates.