Best way to use a Scalar UDF that needs to appear

2019-06-01 02:57发布

问题:

I have an expensive scalar UDF that I need to include in a select statement and use that value to narrow the results in the where clause. The UDF takes parameters from the current row so I can't just store it in a var and select from that.

Running the UDF twice per row just feels wrong:

Select someField, 
       someOtherField, 
       dbo.MyExpensiveScalarUDF(someField, someOtherField)
from someTable
where dbo.MyExpensiveScalarUDF(someField, someOtherField) in (aHandfulOfValues)

How do I clean this up so that the function only runs once per row?

回答1:

Just because you happen to mention the function twice doesn't mean it will be computed twice per row. With luck, the query optimizer will computed it only once per row. Whether it does or not may depend in part on whether the UDF appears to be deterministic or nondeterministic.

Take a look at the estimated execution plan. Maybe you'll find out you're worrying about nothing.

If it's computed twice, you could try this and see if it changes the plan, but it's still no guarantee:

WITH T(someField,someOtherField,expensiveResult) as (
  select someField, someOtherField, dbo.MyExpensiveScalarUDF(someField, someOtherField)
  from someTable
)
  select * from T
  where expensiveResult in (thisVal,thatVal,theotherVal);


回答2:

Steve is correct - the query plan will probably not re-evaluate identical expressions if the UDF is deterministic.

However, repeating yourself is a potential maintenance problem:

WITH temp AS (
Select someField, 
       someOtherField, 
       dbo.MyExpensiveScalarUDF(someField, someOtherField) AS scalar
from someTable
)
SELECT *
FROM temp
where scalar in (aHandfulOfValues)

You can avoid it with a CTE or nested query.

Scalar UDFs are best to be avoided if at all possible for rowsets of any significant size (say a half million evaluations). If you expand it inline here (and with a CTE you won't have to repeat yourself), you'll probably find a huge performance boost. Scalar UDFs should be a last resort. In my experience you're far better off using a persisted computed column, or inline or just about any other technique before relying on a scalar UDF.



回答3:

I'd need a lot more detail before I could address this specific question, but two general ideas hit me right off:

(1) Can you make it a table-based function, join it in the FROM clause, and work from there?

(2) Look into the OUTER APPLY and CROSS APPLY join clauses. Essentially, they allow joins on table-based functions, where the parameters passed to the function are based on the row being joined (as opposed to a single call). Good examples of this are in BOL.