Why do SQL Server Scalar-valued functions get slow

2019-01-10 21:15发布

站内文章 / SQL-Server

57 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Why do Scalar-valued functions seem to cause queries to run cumulatively slower the more times in succession that they are used?

I have this table that was built with data purchased from a 3rd party.

I've trimmed out some stuff to make this post shorter... but just so you get the idea of how things are setup.

CREATE TABLE [dbo].[GIS_Location](
        [ID] [int] IDENTITY(1,1) NOT NULL, --PK
        [Lat] [int] NOT NULL,
        [Lon] [int] NOT NULL,
        [Postal_Code] [varchar](7) NOT NULL,
        [State] [char](2) NOT NULL,
        [City] [varchar](30) NOT NULL,
        [Country] [char](3) NOT NULL,

CREATE TABLE [dbo].[Address_Location](
    [ID] [int] IDENTITY(1,1) NOT NULL, --PK
    [Address_Type_ID] [int] NULL,
    [Location] [varchar](100) NOT NULL,
    [State] [char](2) NOT NULL,
    [City] [varchar](30) NOT NULL,
    [Postal_Code] [varchar](10) NOT NULL,
    [Postal_Extension] [varchar](10) NULL,
    [Country_Code] [varchar](10) NULL,

Then I have two functions that look up LAT and LON.

CREATE FUNCTION [dbo].[usf_GIS_GET_LAT]
(
    @City VARCHAR(30),
    @State CHAR(2)
)
RETURNS INT 
WITH EXECUTE AS CALLER
AS
BEGIN
    DECLARE @LAT INT

    SET @LAT = (SELECT TOP 1 LAT FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)

RETURN @LAT
END


CREATE FUNCTION [dbo].[usf_GIS_GET_LON]
(
    @City VARCHAR(30),
    @State CHAR(2)
)
RETURNS INT 
WITH EXECUTE AS CALLER
AS
BEGIN
    DECLARE @LON INT

    SET @LON = (SELECT TOP 1 LON FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)

RETURN @LON
END

When I run the following...

SET STATISTICS TIME ON

SELECT
    dbo.usf_GIS_GET_LAT(City,[State]) AS Lat,
    dbo.usf_GIS_GET_LON(City,[State]) AS Lon
FROM
    Address_Location WITH(NOLOCK)
WHERE
    ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)

SET STATISTICS TIME OFF

100 ~= 8 ms, 200 ~= 32 ms, 400 ~= 876 ms

--Edit Sorry I should have been more clear. I'm not looking to tune the query listed above. This is just a sample to show the execution time getting slower the more records it crunches through. In the real world application the functions are used as part of a where clause to build a radius around a city and state to include all records with in that region.

回答1:

In most cases, it's best to avoid scalar valued functions that reference tables because (as others said) they are basically black boxes that need to be ran once for every row, and cannot be optimized by the query plan engine. Therefore, they tend to scale linearly even if the associated tables have indexes.

You may want to consider using an inline-table-valued function, since they are evaluated inline with the query, and can be optimized. You get the encapsulation you want, but the performance of pasting the expressions right in the select statement.

As a side effect of being inlined, they can't contain any procedural code (no declare @variable; set @variable = ..; return). However, they can return several rows and columns.

You could re-write your functions something like this:

create function usf_GIS_GET_LAT(
    @City varchar (30),
    @State char (2)
)
returns table
as return (
  select top 1 lat
  from GIS_Location with (nolock) 
  where [State] = @State
    and [City] = @City
);

GO

create function usf_GIS_GET_LON (
    @City varchar (30),
    @State char (2)
)
returns table
as return (
  select top 1 LON
  from GIS_Location with (nolock)
  where [State] = @State
    and [City] = @City
);

The syntax to use them is also a little different:

select
    Lat.Lat,
    Lon.Lon
from
    Address_Location with (nolock)
    cross apply dbo.usf_GIS_GET_LAT(City,[State]) AS Lat
    cross apply dbo.usf_GIS_GET_LON(City,[State]) AS Lon
WHERE
    ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)

回答2:

They do not.

There is no bug in scalar functions that causes its performance to degrade exponentially depending on the number of rows in the scalar function is executed against. Try your tests again and have a look at SQL profiler, looking at the CPU and READS and DURATION columns. Increase you test size to include tests that take longer than a second, two seconds, five seconds.

CREATE FUNCTION dbo.slow
(
    @ignore int
)
RETURNS INT 
AS
BEGIN
    DECLARE @slow INT
    SET @slow = (select count(*) from sysobjects a 
        cross join sysobjects b 
        cross join sysobjects c 
        cross join sysobjects d 
        cross join sysobjects e 
        cross join sysobjects f
    where a.id = @ignore) 

    RETURN @slow
END
go
SET STATISTICS TIME ON

select top 1 dbo.slow(id)
from sysobjects
go
select top 5 dbo.slow(id)
from sysobjects
go
select top 10 dbo.slow(id)
from sysobjects
go
select top 20 dbo.slow(id)
from sysobjects
go
select top 40 dbo.slow(id)
from sysobjects

SET STATISTICS TIME OFF

Output

SQL Server Execution Times:
   CPU time = 203 ms,  elapsed time = 202 ms.


SQL Server Execution Times:
   CPU time = 889 ms,  elapsed time = 939 ms.

SQL Server Execution Times:
   CPU time = 1748 ms,  elapsed time = 1855 ms.

SQL Server Execution Times:
   CPU time = 3541 ms,  elapsed time = 3696 ms.


SQL Server Execution Times:
   CPU time = 7207 ms,  elapsed time = 7392 ms.

Keep in mind that if you are running a scalar function against rows in the result set, the scalar function will be executed per-row with no global optimisation.

回答3:

You can wrap your functionality in an inline TVF, that will be much faster:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/05/23/reuse-your-code-with-cross-apply.aspx

回答4:

you call the function two times (two select hits to the DB) for every row in the result set.

to make your query faster join right to GIS_Location and skip the functions:

SELECT
    g.Lat,
    g.Lon
FROM
    Address_Location        l WITH(NOLOCK)
    INNER JOIN GIS_Location g WITH(NOLOCK) WHERE l.State = g.State AND l.City = g.City
WHERE
    ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)

I'm not sure why the NOLOCK, or the crazy where clause, I just copied from the question...

回答5:

Simply put, because SQL expressions with user defined functions are less efficient than SQL expressions without them. The execution logic can't be optimized; and the function overhead (including calling protocols) must be incurred for every row.

KMike's advice is good. WHERE .. IN (SELECT something) is not likely to be an efficient pattern, and in this case can be easily replaced with a JOIN.

回答6:

See if this works better... Or maybe a distinct inner join?

select a.*,
(select top 1 g.Lat from GIS_Location g where g.City = a.City and g.State = a.State) as Lat,
(select top 1 g.Lon from GIS_Location g where g.City = a.City and g.State = a.State) as Lon
from Address_Location a
where a.ID in (select top 100 ID from Address_Location order by ID desc)

As for the scalar function performance, I'm not sure.

回答7:

Typically scalar functions are much slower than inline TVF counterparts. Fortunately for many scenarios it will change.

SQL Server 2019 will introduce Scalar UDF Inlining:

A feature under the intelligent query processing suite of features. This feature improves the performance of queries that invoke scalar UDFs in SQL Server (starting with SQL Server 2019 preview)

T-SQL Scalar User-Defined Functions

User-Defined Functions that are implemented in Transact-SQL and return a single data value are referred to as T-SQL Scalar User-Defined Functions. T-SQL UDFs are an elegant way to achieve code reuse and modularity across SQL queries. Some computations (such as complex business rules) are easier to express in imperative UDF form. UDFs help in building up complex logic without requiring expertise in writing complex SQL queries.

Scalar UDFs typically end up performing poorly due to the following reasons.

Iterative invocation

Lack of costing

Interpreted execution

Serial execution

Automatic Inlining of Scalar UDFs

The goal of the Scalar UDF inlining feature is to improve performance of queries that invoke T-SQL scalar UDFs, where UDF execution is the main bottleneck.

With this new feature, scalar UDFs are automatically transformed into scalar expressions or scalar subqueries that are substituted in the calling query in place of the UDF operator. These expressions and subqueries are then optimized. As a result, the query plan will no longer have a user-defined function operator, but its effects will be observed in the plan, like views or inline TVFs.

Inlineable Scalar UDFs requirements

A scalar T-SQL UDF can be inline if all of the following conditions are true:

The UDF is written using the following constructs:

DECLARE, SET: Variable declaration and assignments.

SELECT: SQL query with single/multiple variable assignments1.

IF/ELSE: Branching with arbitrary levels of nesting.

RETURN: Single or multiple return statements.

UDF: Nested/recursive function calls2.

Others: Relational operations such as EXISTS, ISNULL.

The UDF does not invoke any intrinsic function that is either time-dependent (such as GETDATE()) or has side effects3 (such as NEWSEQUENTIALID()).

The UDF uses the EXECUTE AS CALLER clause (the default behavior if the EXECUTE AS clause is not specified).

The UDF does not reference table variables or table-valued parameters.

The query invoking a scalar UDF does not reference a scalar UDF call in its GROUP BY clause.

The UDF is not natively compiled (interop is supported).

The UDF is not used in a computed column or a check constraint definition.

The UDF does not reference user-defined types.

There are no signatures added to the UDF.

The UDF is not a partition function.

Checking if function is inlinable:

SELECT OBJECT_NAME([object_id]) AS name, is_inlineable
FROM sys.sql_modules
WHERE [object_id] = OBJECT_ID('schema.function_name')

Enabling/disabling feature on database level:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON/OFF;

标签： sql sql-server tsql sql-server-2005 sql-function

叛逆

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

Why do SQL Server Scalar-valued functions get slow

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

回答7:

T-SQL Scalar User-Defined Functions

Automatic Inlining of Scalar UDFs

Inlineable Scalar UDFs requirements

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮