UDF Performance in MySQL

2019-06-24 19:19发布

问题:

I'm noticing exponential performance degradation in MySQL query execution times when queries include calls to UDF's in the SELECT or WHERE clauses. The UDF's in question query local tables to return scalar values - so they're not just performing arithmetic expressions, but serving as correlated subqueries. I've fixed the performance problem by simply removing the UDF's and rewriting with correlated subqueries, more complex joins, etc.

I suppose if I only had experience with MySQL I would simply accept this as a reality, adjust my use of UDF's and move on. But prior to working with MySQL I worked for 5+ years on SQL Server. I built a billing system that processed much larger data sets and relied very heavily on both scalar and table-value user-defined functions. Those UDF's also performed queries (i.e. not just arithmetic operations). I didn't experience this sort of performance penalty when using user-defined functions on SQL Server.

What I'm wondering is whether there's anyone here who knows SQL Server vs. MySQL internals well enough to confirm or explain away my current theory as to the cause for this performance difference in UDF's on the two systems. My theory is that SQL Server's optimizer evaluates UDF's differently than MySQL's. Perhaps it's because the table engines are decoupled in MySQL? Or maybe the use of UDF's on SQL Server is more prevalent and the MySQL engine's optimizer simply hasn't evolved as far yet? What I'm thinking is that maybe the SQL Server optimizer treats included UDF's as part of the surrounding query (when possible) and then optimizes it along with the rest of the query? Maybe I'm way off the mark here, but I just never saw this kind of performance hit for using UDF's on SQL Server.

Any light others can shed on this issue will be appreciated.

回答1:

UDFs have known limitations and problems. Please see: Are UDFs Harmful to SQL Server Performance?

There are many articles on this topic. Hopefully this is a non-subscriber access: Beware Row-by-Row Operations in UDF Clothing



回答2:

I know this is an old question, but it comes up first in the Google search for "MySQL UDF performance" and does not yet have an adequate answer - one link in the accepted answer is broken, the other does not appear to talk about the specifics of MySQL UDFs.

First, let us make sure we are talking about the actual MySQL UDFs. In MySQL, there is a distinction between a "stored function" and a UDF. A stored function is run using the internal stored function/procedure interpreter. A UDF is written in C++ and is compiled into a shared library which is loaded by MySQL server into the memory and when called, it runs as machine code on the CPU. Thus, UDF performance is frequently orders of magnitude better than that of stored functions.

So first of all, make sure you are talking about the actual UDF, and this is not a stored function.

Second, MySQL UDF performance depends on the nature of the algorithm it is executing and the quality of its implementation. For example, if your UDF is testing all possible triplets of characters of a string that is 1000 bytes long, it will be examining 1 billion combinations, and will take around several seconds per row. So if removing the UDFs makes your code run significantly faster, the next step is debugging the UDF itself to make sure it is written optimally - or perhaps the question the UDF is trying to answer just cannot be answered quickly.

That said, a well-written UDF that is answering a relatively simple question is usually lightening-fast compared to the I/O needed to feed it the data to analyze.