THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Not all UDFs are bad for performance.

There is a popular misconception that UDFs have adverse effect on performance. As a blanket statement, this is simply not true. In fact, inline table-valued UDFs are actually macros – the optimizer is very well capable rewriting queries involving them as well as optimizing them. However, scalar UDFs are usually  very slow.  I will provide a short example.

 

Prerequisites

 

Here is the script to create and populate the tables:

CREATE TABLE States(Code CHAR(2), [Name] VARCHAR(40), CONSTRAINT PK_States PRIMARY KEY(Code))

GO

INSERT States(Code, [Name]) VALUES('IL', 'Illinois')

INSERT States(Code, [Name]) VALUES('WI', 'Wisconsin')

INSERT States(Code, [Name]) VALUES('IA', 'Iowa')

INSERT States(Code, [Name]) VALUES('IN', 'Indiana')

INSERT States(Code, [Name]) VALUES('MI', 'Michigan')

GO

CREATE TABLE Observations(ID INT NOT NULL, StateCode CHAR(2), CONSTRAINT PK_Observations PRIMARY KEY(ID))

GO

SET NOCOUNT ON

DECLARE @i INT

SET @i=0

WHILE @i<100000 BEGIN

  SET @i = @i + 1

  INSERT Observations(ID, StateCode)

  SELECT @i, CASE WHEN @i % 5 = 0 THEN 'IL'

    WHEN @i % 5 = 1 THEN 'IA'

    WHEN @i % 5 = 2 THEN 'WI'

    WHEN @i % 5 = 3 THEN 'IA'

    WHEN @i % 5 = 4 THEN 'MI'

    END

END

GO

 

 

When a query involving a UDF is rewritten as an outer join.

 

Consider the following query:

SELECT o.ID, s.[name] AS StateName

  INTO dbo.ObservationsWithStateNames_Join

  FROM dbo.Observations o LEFT OUTER JOIN dbo.States s ON o.StateCode = s.Code

/*

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Observations'. Scan count 1, logical reads 188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'States'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

SQL Server Execution Times:

   CPU time = 187 ms,  elapsed time = 188 ms.

*/

And compare it to a query involving an inline table valued UDF:

CREATE FUNCTION dbo.GetStateName_Inline(@StateCode CHAR(2))

RETURNS TABLE

AS

RETURN(SELECT [Name] FROM dbo.States WHERE Code = @StateCode);

GO

SELECT ID, (SELECT [name] FROM dbo.GetStateName_Inline(StateCode)) AS StateName

  INTO dbo.ObservationsWithStateNames_Inline

  FROM dbo.Observations

Both its execution plan and its execution costs are the same – the optimizer has rewritten it as an outer join. Don’t underestimate the power of the optimizer!

 

A query involving a scalar UDF is much slower.

 

Here is a scalar UDF:

CREATE FUNCTION dbo.GetStateName(@StateCode CHAR(2))

RETURNS VARCHAR(40)

AS

BEGIN

  DECLARE @ret VARCHAR(40)

  SET @ret = (SELECT [Name] FROM dbo.States WHERE Code = @StateCode)

  RETURN @ret

END

GO

Clearly the query using this UDF provides the same results but it has a different execution plan and it is dramatically slower:

/*

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 3 ms.

Table 'Worktable'. Scan count 1, logical reads 202930, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Observations'. Scan count 1, logical reads 188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

SQL Server Execution Times:

   CPU time = 11890 ms,  elapsed time = 38585 ms.

*/

As you have seen, the optimizer can rewrite and optimize queries involving inline table valued UDFs. On the other hand, queries involving scalar UDFs are not rewritten by the optimizer – the execution of the last query includes one function call per row, which is very slow.

Also thank you Peter and Adam for setting me up as a blogger on this wonderful site!

 

 

Published Monday, April 21, 2008 6:43 PM by Alexander Kuznetsov
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

James Luetkehoelter said:

Good post, and good to point out the difference between scalar UDFs and table or multi-statement table functions. I've found the latter to be quite useful (in fact, so does MS - the dynamic management view sys.dm_db_index_physical_stats is actually a table (or multi-statement table) function (or it at least it behaves that way).

April 21, 2008 6:57 PM
 

TroyK said:

Hi Alexander;

Good information. However, you may wish to qualify your last assertion about scalar UDFs -- they are slower when there is a table read operation (the SELECT...).

If, on the other hand, a particular scalar UDF doesn't perform a table read operation, it will behave the same as if the computation were expressed in the query itself.

TroyK

April 29, 2008 1:14 PM
 

AmolP said:

Does this hold water for multi statement table valued UDFs?

From what I have seen, if you use a multi statement table valued UDF in a join, performance goes down a lot. This is because the optimizer does not know before hand the result sets from the UDF and cannot optimize and estimate the rows coming back from it.

Also, results from the UDF would be non indexed and could slow down the join. In such cases, it may be advisable to store the results of the UDF in a temp table, create an index on it and then use that temp table in the join.

I am certainly not saying this would be a "works for all situations" solution, just sharing some of my observations.

October 14, 2009 2:58 PM
 

Alexander Kuznetsov said:

Usually multistatement table valued UDFs are slowish. There are exceptions when we explicitly want to split some complex query into simpler ones.

October 14, 2009 3:13 PM
 

Dave McRae said:

Wonderful article, thanks.

Extending this for the Multi-statement case

CREATE FUNCTION dbo.GetStateName_MultiStatement(@StateCode CHAR(2))

RETURNS @Result TABLE (Name Varchar(40))

AS

BEGIN

Insert Into @Result Select Name FROM dbo.States WHERE Code = @StateCode

RETURN

END

GO

SELECT ID, (SELECT [name] FROM dbo.GetStateName_MultiStatement(StateCode)) AS StateName

 FROM dbo.Observations

and sadly, it took the same time as the scalar UDF

October 8, 2013 6:56 PM
 

AlexK said:

Dave,

Your example makes sense to me. Thanks!

October 8, 2013 7:57 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works at DRW Trading Group in Chicago, where he leads a team of developers, practicing agile development, defensive programming, TDD, and database unit testing.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement