THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Multi-statement TVFs are essentially slowish nested loops.

Whenever we are using multi-statement TVFs, we are essentially forcing nested loops logic on the database engine. Although multi-statement TVFs are smart enough and do not always execute once per row, when they do so, they may be much slower than nested loops.

As usual, inline UDFs shine as compared to multi-statement ones, at least in all the benchmarks in this post - let us run some tests and see for ourselves.

 

 Setting up data

 

First of all, here is a table, rather large, 512K rows, and rather wide, less than 20 rows per page, so that reading from it via nested loops is a natural choice in many cases:


 

CREATE TABLE dbo.Parent
    
(
      
ID INT NOT NULL
             
PRIMARY KEY ,
      
ParentNumber INT NOT NULL,
      
SpaceFiller CHAR(500) NOT NULL
    ) ;
GO
DECLARE @adder INT ;
SET @adder ;
INSERT  INTO dbo.Parent
        
IDParentNumberSpaceFiller )
VALUES  11'#' ) ;

WHILE @adder 500000 
    
BEGIN ;
        
INSERT  INTO dbo.Parent
                
ID ,
                  
ParentNumber ,
                  
SpaceFiller
                
)
                
SELECT  ID @adder ,
                        
ParentNumber ,
                        
SpaceFiller
                
FROM    dbo.Parent ;
        
SET @adder @adder ;
    
END ;
GO
  

The Child table is much smaller and narrow:

CREATE TABLE dbo.Child
    
(
      
ID INT NOT NULL
             
PRIMARY KEY ,
      
ParentID INT NOT NULL ,
      
ChildNumber INT NOT NULL UNIQUE
    
) ;   
GO

INSERT INTO dbo.Child
        
IDParentIDChildNumber )
SELECT ID-- ID - int
          
ID-- ParentID - int
          
ID  -- ChildNumber - int
FROM dbo.Parent WHERE ID BETWEEN AND 1000 

 

 Comparing nested loops vs. multi-statement UDF calls

 

Here are the two UDFs which we are  going to benchmark:

 CREATE FUNCTION dbo.GetParentNumber_Multiline @ParentID INT )
RETURNS @parentInfo TABLE ParentNumber INT )
AS 
    BEGIN 
;
        
INSERT  @parentInfo
                
ParentNumber
                
)
                
SELECT  ParentNumber
                
FROM    dbo.Parent
                
WHERE   ID @ParentID ;
        
RETURN ;                
    
END ;
GO

CREATE FUNCTION GetParentNumber_Inline @ParentID INT )
RETURNS TABLE
AS
RETURN
    
SELECT    ParentNumber
      
FROM      dbo.Parent
      
WHERE     ID @ParentID
    
) ;
 

Let us fire up the Profiler, make sure it records individual statements, and run the following script:

SELECT IDParentIDp.ParentNumber
INTO #t1
FROM dbo.Child
CROSS APPLY dbo.GetParentNumber_Multiline ParentID AS ;

DROP TABLE #t;

SELECT IDParentIDp.ParentNumber
INTO #t1
FROM dbo.Child
CROSS APPLY dbo.GetParentNumber_Inline ParentID AS p

DROP TABLE #t

In the Profiler, we shall see that the multi-statement UDF has been invoked 1000 times, which is slow:

Multiline stats:

   CPU time = 312 ms,  elapsed time = 696 ms.

Inline stats:

   CPU time = 0 ms,  elapsed time = 7 ms.
 

Clearly the difference in performance is dramatic, while the plan is essentially the same - nested loops. We can highlight the query, compare the execution plans, and see for ourselves.

 

Multi-statement UDFs do not always run once per row

 

While we are at it, let us bust a myth: multi-statement UDFs do not have to run once per row. Let us have all child rows refer to only one parent:

UPDATE dbo.Child SET ParentID 

Let us rerun our benchmarks. In the profiler we shall see that the multi-statement UDF has been invoked only once. Execution costs are also quite comparable this time for both scripts:

   CPU time = 15 ms,  elapsed time = 21 ms.

As we have seen, the database engine is smart enough and may sometimes execute multi-statement UDFs once per distinct set of parameters, not once per row.

 

Multi-statement UDFs essentially force nested loops on the engine

 

Let us set up test data differently, so that the second benchmark (the one using inline UDF) executes as a hash join:

 

UPDATE dbo.Child SET ParentID ID 

TRUNCATE TABLE dbo.Parent ;
GO
INSERT INTO dbo.Parent
        
IDParentNumberSpaceFiller )
SELECT IDID'*'        
FROM dbo.Child ;
  

The first benchmark still executes as a nested loop, and is much slower

Multi-statement UDF, nested loops:
   CPU time = 266 ms,  elapsed time = 674 ms.
   
Inline UDF, hash join:
   CPU time = 0 ms,  elapsed time = 5 ms.

 

Conclusion

 

As we have seen, if we want to force nested loops, we can use multi-statement UDFs.Also we have seen that those multi-statement UDFs can be very slow.

However, let us be cautious: when we want to benchmark real life solutions, we do not want to do it with the Profiler running, like I did in this post. Also, as Dave Ballantine has correctly pointed out, in some cases STATISTICS TIME may skew the results just as well.

 

Published Tuesday, February 02, 2010 3:53 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

I hope you meant multi-statement instead of multi-line?

February 2, 2010 5:15 PM
 

Alexander Kuznetsov said:

Adam,

I replaced multiline with multi-statement in the the title and the body of this post. Thanks!

February 2, 2010 5:27 PM
 

Uri Dimant said:

Sasha, usually we use multi-statement TVF when we need to to check parameter's values or whatever within a UDF..,For sure it is possible we try to avoid them. There is no necessity to write multi statement UDF without any logic inside..

February 3, 2010 12:34 AM
 

Greg Linwood said:

Nice post Alex. Uri - it looks like you're using a multi-statement TVF for posting to SQLBlog.com!

February 3, 2010 2:23 AM
 

daveballantyne said:

Not that im saying that non-inlined UDF's are fine , but in 2005 (and to a lesser degree 2008) using 'SET STATATISTICS TIME ON' contributes to the overhead.

http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/11/24/the-observer-effect-in-action.aspx

February 3, 2010 5:45 AM
 

Uri Dimant said:

:-)) I pressed only once on the button...Adam, can you check it please and remove two of posts? Thanks

February 3, 2010 7:50 AM
 

Adam Machanic said:

Dave, I just tried your script and it doesn't repro on my end. As a matter of fact, I'm consistently getting the OPPOSITE effect: The batch with STATISTICS TIME turned on is taking ~127ms, and the batch with it turned off is taking ~165ms. This of course makes no sense--I would expect STATISTICS TIME to have some overhead--but it probably comes down to how SQL Trace is collecting the timings. I certainly would not expect a 10x increase in time like you observed, and in years of using it I'm quite certain that I never have.

February 3, 2010 10:09 AM
 

Alexander Kuznetsov said:

Uri,

In order to fairly compare the performance of nested loops and TVFs, I stripped the TVFs from all additional bells and whistles. This is why its body is so trivial. Can you post an example of how you typically use a multistatement TVF?

Dave, I do not see the difference you have mentioned. Maybe it depends on the version or something else I am missing? Anyway, I believe that I have posted the full repro script and everyone can rerun in their environments and see how it compares for their particular version and edition.

Of course, besides SET STATISTICS, the Profiler also can contribute to the difference in performance.

February 3, 2010 1:31 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works at DRW Trading Group in Chicago, where he leads a team of developers, practicing agile development, defensive programming, TDD, and database unit testing.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement