THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Cost of Scalar Functions

While Adam was living it up at VSLive in New York last week, I was consulting in Atlanta, trying to find way to make an application 'run faster'. I won't go into the gory details of everything I looked at, but one there is one aspect of the system I'll tell you about.

When the company's DBA mentioned that one of the big reports they were running had to mask the Social Security Number in all the data rows as they were returned, I heard some warning bells go off in my head. We then started a server-side trace and ran the job, which took several hours and filled 4 50MB trace files. While I was working on a different problem, the DBA took a look at the trace files. He then told me that the first of the four files was the only one I needed to look at because the other 3 were filled with nothing by calls to the function that was masking the numbers. All the other work done in the 4 hour report was contained in first 10 percent of the first file.

I remembered tracing functions at one time long ago, maybe when I was writing the Inside SQL Server 2000 book. But I thought it was time to do it again. I took the pubs database and created a copy of the some of the data in the authors table, which includes a Social Security number called au_id, and then copied the table into itself enough times to get about 100,000 rows.  I then created a simplified version of the masking function that my client had been using.

To test the cost of the function, I first selected from the table applying the underlying function code to the SSN column. I then selected from the table, actually calling the function in each row.

When I traced only the event TSQL:BatchCompleted, I was amazed to see that when using the function the CPU cost and the duration were more than TWICE as high as the values when just using the underlying code. The reads were about the same, as expected, since both statements were accessing the same data.

I expected the cost to be higher, but not twice as high. But I was still to be truly astounded. When I then changed the trace definition to trace every statement completed in a procedure (SP:StmtCompleted) the cost of the query with the function call was vastly higher. Every call to the function was treated as a call to a procedure, and traced as a statement in a procedure. The CPU was 5 times higher and the duration increased by a factor of 30!

So, if one warning I can give you is to be careful when using scalar functions on a large result set, the more important warning here seems to be that you should be very careful what you trace. The act of tracing can impose a performance penalty that can sometimes mask the problem you're actually trying to track down.

I am attaching a script file that includes the code for building the new big table, and the code for the masking function so you can try it and see the performance on your own SQL Server. (The blog software won't let me upload a .sql file, so I have just given it a .txt extension.)

-- Kalen


Published Sunday, September 17, 2006 2:51 PM by Kalen Delaney

Attachment(s): test function calls.txt

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Peter W. DeBetta said:

I thought I'd try another approach to see if the results were any different. Instead of a scalar-valued function, I created an inline table-valued function (TVF) that would return a single column of the masked SSN. I then used the CROSS APPLY operator to join the [bigauthors] table to the TVF.

Much to my delight, using CROSS APPLY with the TVF actually performed better time-wise (on my machine) than the SELECT with no function calls. Using Profiler, the CPU for the TVF solution was about 9% better and the Duration was about 1% better.

--The code

CREATE FUNCTION [dbo].[MaskSSNTable] (@SSN varchar(15))
     WHEN PATINDEX('%[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%', @SSN) > 0
           'XXX-XX-' + SUBSTRING(@SSN, PATINDEX('%[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]%', @SSN) + 7, 4)
     ELSE '-----------'
   END AS Mask)

SELECT m.Mask, ba.SSN, ba.First, ba.Last
FROM [bigauthors] AS ba
CROSS APPLY [dbo].[MaskSSNTable](ba.ssn) AS m

September 19, 2006 10:28 AM

Kalen said:

Peter, thanks for doing this. This is way cool. I might even be able to pull a SQL Server Magazine article out of this, and if I write one, I will definitely mention your name!

September 19, 2006 11:48 AM

Adam Machanic said:

Hmm, I think I can claim prior art for this one ;-)

September 19, 2006 3:12 PM

Peter W. DeBetta said:

Oh sure, but you didn't try it for this scenario, did you Adam? ;-)

Seriously, it was this post along with Adam's that inspired my wanting to know which of the three worked best.

So Kalen, you will need to mention both of us! :-)
September 19, 2006 11:48 PM

Kalen said:

Keeps those comments coming... the article is getting better and better! :-)

Great post, Adam, thanks for pointing me to it. I think I might have read it when you first posted it, and it was one of the things that was nagging at my mind when I was thinking about scalar UDFs.

yes, of course I'll mention both of you... this is a great opportunnity to mention this blog site in my SQL Mag column.

-- Kalen
September 20, 2006 10:07 AM

Lara said:

Hi Kalen!  I just noticed you are blogging!  

Thanks for documenting this warning.  I ran into this issue just last week.  A customer was applying a function to convert a column from a static table to proper case within a stored procedure that executed 30-100+ times per minute.  Imagine where Foo has about 200 rows:  
SELECT dbo.fn_Proper(Column)
FROM dbo.Foo

Certainly not anything difficult to follow.  Users began complaining of poor performance while I was there, though.  I noticed a lot of blocking and this particular statement was executing in 30-60 seconds per execution (this drove a dropdown on an external customer facing website - not a good thing).  The interesting fact was that this table was static, and the data was stored in proper case.  Against my general guidelines (no buckaroos in the database tier), I removed the function from the stored procedure in production so the business could continue.  The database has been much happier since the update.  
October 3, 2006 8:27 PM

Paul Davallou said:

Thanks for summarizing the issue Kalen and being specific about the trace events you used. I too, found a similar situation in a critical component of our product six months ago. When testing the original code, the processing time was about 9 minutes. After moving the function in-line, the processing time reduced to 12 seconds (very complex business logic code). That made for a great week! I'll always look for similar situations.
November 10, 2006 12:06 PM

Leave a Comment


This Blog


Favorite Non-technical Sites or Blogs

Privacy Statement