Using MSSQL 2005
I was playing around today with a Scalar UDF in a where statement to see some of the costs associated with making the call and io differences etc.
I'm starting with 2 basic tables. Customer which has 1 million rows. and Purchases which has 100,000. Both have an auto identity column as Primary key. No other indexes defined.
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
SELECT * FROM Customer C
INNER JOIN Purchases P on C.[IDENTITY] = P.CustomerID
WHERE P.Amount > 1000
SET STATISTICS IO OFF
This returns IO statistics of
Table 'Customer'. Scan count 0, logical reads 3295, physical reads 1, read-ahead reads 32, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Purchases'. Scan count 1, logical reads 373, physical reads 1, read-ahead reads 370, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So just to see the impact of a scalar UDF I then just moved the P.Amount > 1000 to a UDF. Function is as follows:
CREATE FUNCTION [dbo].[HighValuePurchase]
(
@value int
)
RETURNS bit
AS
BEGIN
DECLARE @highValue bit
SET @highValue = '0'
IF @value > 1000
BEGIN
SET @highValue = '1'
END
RETURN @highValue
END
So I then ran the following query:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
SELECT * FROM Customer C
INNER JOIN Purchases P on C.[IDENTITY] = P.CustomerID
WHERE dbo.HighValuePurchase(P.Amount) = '1'
SET STATISTICS IO OFF
I was expecting this to run worse. This query returned the following IO statistics:
Table 'Purchases'. Scan count 1, logical reads 373, physical reads 1, read-ahead reads 370, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 35, physical reads 3, read-ahead reads 472, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
This also returned faster than the > 1000 query. While the same rows were returned the ordering of the one calling the UDF was automatically sorted by C.[IDENTITY] where the other query appeared unsorted. This is likely due to the way the combines were done in the execution plans. Outline of the plans is below.
Execution plan for the non UDF shows a Clustered Index scan for Purchases and a Clustered Index seek for Customers combined at a nested join.
Execution plan for the UDF version shows a clustered index scan for purchases, then a filter, then a sort. There's a clustered Index scan on Customer. Then the results are combined in a Merge Join.
I'm sure this has to do with lack of indexes etc, but I'm unsure why these results are the way they are. I've experienced UDF's running painfully slow and everyone says using them is usually a bad idea, which is why I threw this test together. I can't explain currently why the UDF version seems to be so much better.