THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

  • Fake statistics, and how to get rid of them

    There are two ways to test how your queries behave on huge amounts of data. The simple option is to actually use them on huge amounts of data – but where do you get that if you have no access to the production database, and how do you store it if you happen not to have a multi-terabyte storage array sitting in your basement? So here’s the second best option: you cheat.

    Luckily, SQL Server has a feature that allows you to do just that. I must warn you that this feature is undocumented and unsupported. You should never under any circumstances use this feature on a production server, unless explicitly directed to by qualified Microsoft staff. Using it, sparingly, on a test box is okay. But as an undocumented feature, there is no guarantee that it will always work, or that it will continue to work in future versions.

    With that disclaimer out of the way, let’s take a look at the “cheat” option. As you probably know, the Query Optimizer relies on statistics for its decisions. If those statistics are wrong, the decisions will probably be wrong. Here is where we find the leverage to cheat: if we can force SQL Server to use statistics that are representative of the production database rather than the test database, we will get the execution plan it would generate on production. The actual performance will probably still be lots faster, but at least we can look at the plan and use our understanding of execution plans to check that we get scans where we want scans, seeks where we want seeks, and all the right types of joins.

    Fake statistics

    The undocumented feature that we can use here actually has a very simple and easy-to-remember syntax. You just run an UPDATE STATISTICS command, adding the options “WITH ROWCOUNT=xxxx, PAGECOUNT=xxxx” to force SQL Server to store the numbers you mention as the rowcount and pagecount. Clear the plan cache, then test your queries and check how they behave with the simulated number of rows and pages. I could give a much longer description and examples of the usage, but others have already done so – see for instance this post from the horse’s mouth (aka the Query Optimization Team), this post from Benjamin Nevarez, or if you are able to read German this excellent post from Uwe Ricken.

    … and how to get rid of them

    But what all those blog posts fail to mention is what happens later. Of course, if you just create a small test table, slap in a few sample rows, fake big statistics, check the execution plan and then drop the table, that is not an issue. But what if you have a test database that contains a lot of carefully crafted test data, and that is used for other tests as well? You may not want to run all the tests with those artificially bumped rowcounts!

    In my case, I ran into this in the AdventureWorks sample database, my loyal companion for presentations and demos. I will present a pre-conference seminar on Execution Plans in Exeter (March 21), in Copenhagen (March 28), and in Zagreb (June 2). For my preparations, I wanted to force a hash spill warning, so I artificially lowered the rowcount for the SalesOrderDetail table to just 200. That worked just fine – this nifty feature can be used to mimic every rowcount, not just the big numbers.

    Because I use this sample database for other purposes as well, I wanted to immediately reset the statistics to their normal value. I figured rebuilding all the statistics on the table with the FULLSCAN option would do the trick. Right? Wrong! The UPDATE STATISTICS did run, it took a while to scan all rows in the table (as requested) – but then it still retained the rowcount and pagecount values that I had forced earlier! And there is no visible indication at all – executing the UDPATE STATISTICS … WITH FULLSCAN statement simply runs for a while, then reports success.

    It was actually mere coincidence that I later did some other tests on the same table, and just happened to notice that the estimated rowcount for this table was still 200. It then took me at least an hour of searching the internet and trying many different options (including using zero or a blank value for the rowcount and pagecount parameters, combining different other options of the UPDATE STATISTICS command, clearing or not clearing the plan cache, and a few other things I don’t even remember. None of them worked. I could of course count the actual number of rows and pages and use that, and that would have worked because my AdventureWorks database never changes – but for a test database where the data can actually change over time, this options would not work.

    In the end, I finally found one a method that works. But it is really a sledgehammer approach, and I prefer not heaving to do this on large tables on a regular basis: ALTER TABLE Sales.SalesOrderDetail REBUILD. After running that statement, I found that the statistics on the SalesOrderDetail table had finally reverted to their normal behavior. Until the next time I need to fake a different number.

    Conclusion

    On a development server, using the WITH ROWCOUNT and WITH PAGECOUNT options of the UPDATE STATISTICS is a fine method to simulate large numbers of rows, or to simulate the effect of wrong statistics. But unfortunately, the numbers given stick for far longer that I like; this is not documented anywhere, and not easy to undo. Rebuilding the table and indexes appears to be the only solution.

    If you have found other ways to return from fake row- and pagecounts to the normal statistics behavior, post a comment and share your knowledge!

  • Database Design training – for free?

    When I started this blog, my plan was to focus on two main subject areas. One of them is SQL Server, T-SQL, and performance. The other is database design.

    Looking back over my post history, I honestly cannot say I delivered on the second area. Not because I have nothing to say about database design, but because I found it doesn’t lend itself for a blog. In the SQL Server / T-SQL area, there are subjects that can be isolated and described in a single post, or in a short series. In database design, I do not see such objects. Everything is related, and the only good way to cover this is end to end – starting at requirements gathering, through modeling, normalization, more normalization, and ending at the final database design.

    That cannot be covered on a blog post. So I found a different place: Pluralsight. Pluralsight is a company that provides online training on IT-related subjects. Excellent training, delivered by the best specialists available, for a fraction of the price of classroom training. And with the additional benefit of being able to pause and rewind as often as you want.

    Relational Database Design

    I was delighted when Pluralsight offered me the opportunity to record a course on relational database design for them. I must admit that creating the course took more time than I had ever anticipated – but it is now finished and available in the Pluralsight course library. It is a very comprehensive course – to see the entire course, end to end, you will need to set aside almost a full work day: 7 hours en 33 minutes. But luckily, you can consume the course in small portions, watching a few clips when you have time and stopping to return later when you’ve had enough.

    The first module is the introduction, in which I give a historic overview of how storage hardware and methods have evolved to where they are now. If you are pressed for time, you might consider skipping this module. Knowing the past can help understand the present better, but this module is not required to understand the rest of the course.

    In module 2, I explain the IDEF1X notation for ER diagrams, that I use throughout the course. Module 3 presents you with techniques you can use to gather the information you need from the domain experts, and module 4 shows how to use that information to create a first draft of the data model. Module 5 then explains how to convert between the ER diagram and the relational database representation of the data model – I personally prefer to postpone this conversion until after the normalization, but not everyone likes to work that way; hence my decision to cover the conversion first.

    Normalization is a subject that is far more complex than many people realize – yet much easier to do than many people fear. I use two (long!) modules to cover basic normalization, up to third normal form; and then a third long module to go over all the so-called “higher” normal forms – often considered to be unnecessary, too hard to do, and too complex to understand. I hope you’ll disagree with those qualifications as much as I do after watching this module.

    IDEF1X may be popular, it is far from the only ER diagramming method. In the last module, I cover several other methods, showing you how much they are alike and where you’ll find the important differences. After seeing it, you should be able to work in any ER diagramming method you may encounter.

    I personally think that this is a very good course on relational database design. But I might be just a bit biased. So I suggest that you go ahead, check it out, and be your own judge!

    Cheap …

    My relational database design course is not only (in my opinion) very good – it is also extremely affordable. Have you ever looked at the cost of a classroom course on database design, then decided that you simply cannot afford it? The good news is – Pluralsight is cheaper! Prices for individual subscriptions start at just $29 per month, or $299 per year for a basic subscription – which already gives unlimited access to the full course library. Or you can pay $49 per month / $499 per year for a “plus” subscription, that adds access to exercise files, assessments, certificates, and offline viewing.

    And for that low price, you do not get access to just my course on relational database design – the full course library of Pluralsight already contains over a thousand titles, with new titles being added almost every day. Readers of this bog should be excited to see over fifty courses in the SQL Server subject area, by renown experts such as Paul Randal, Jonathan Kehayias, Leonard Lobel, Glenn Berry, and many others. And if you want to look beyond the borders of SQL Server, Pluralsight has courses on almost every other subject (as long as it is related to dev and IT).

    … or free?

    But the title of my blog promises “free”, not “cheap”. Thirty bucks per month may be a steal, but it is not free. Was I lying? Of course not! You can check my relational database design course (and more) for free. How? Simple!

    Method 1: available to all, but a bit limited. Go to the Pluralsight website, and sign up for a free trial. This costs you absolutely nothing, and it gives you access to all the Pluralsight courses for a whole week. There is a catch, though – your viewing time is limited to 200 minutes. So you can use the trial to watch one of the shorter courses completely, to get a very extensive taster of a longer course, or to browse around and watch snippets of many courses – that is up to you!

    Using this method is an easy way to watch almost half of the relational database design course for free. But what about the rest? Enter method 2!

    Method 2: sufficient to watch the entire course, but not available to everyone. Pluralsight has allowed me to give out a few one-week training passes. Like the free trial, these give access to all courses in the course library, for a full week. Unlike the free trial, they are not time limited. If you have the stamina, you can get a whole 168 hours of training out of them. (I do not advise this. I am not responsible for the effects on mental and physical health if anyone thinks they should even try this. Please don’t). But with a bit of planning, a week should be enough to consume the entire relational database design course, and still have time left to check out some of the other courses!

    So how do you get such a free pass? The easiest method is to post a comment to this blog, explaining why I should pick you. On February 10, I will choose the five best, most convincing, or maybe just the funniest comments, and reward them with a training pass!

    So go the the Pluralsight site, sign up for that free trial, check out my course, then return here and post a comment. Convince me!

  • Parameterization and filtered indexes (part 2)

    In my previous post, I demonstrated how the presence of a filtered index can get in the way of successful parameterization, possibly resulting in too much query compilations and procedure cache bloat. I suggest reading that first, because I will go straight where I left off.

     

    Use the Force, Luke

     

    If you read the previous post very carefully, you will have noticed that I wrote that if a plan is unstable, it is “unsafe for simple parameterization” – and that suggests that SQL Server will not have such inhibitions when using forced parameterization. Let’s have a look. Assuming you still have the indexes I created in the previous blog post, just execute the query below change the parameterization setting for AdventureWorks2012 to forced, clear out the plan cache (and remember not to do this on a production system!), execute the query that would not successfully parameterize when the database was set to simple parameterization, and inspect the plan cache. You will see that this time, the query was indeed parameterized. If you also add the code to watch the “Unsafe Auto-Params/sec” counter, you will see no changes to its value.

     

    USE AdventureWorks2012;

    go

    ALTER DATABASE AdventureWorks2012

    SET PARAMETERIZATION FORCED;

    go

    DBCC FREEPROCCACHE;

    go

    SELECT SUM(OrderQty)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    go

    WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT      cp.objtype AS ObjectType,

                cp.usecounts AS UseCount,

                st.[text] AS QueryText,

                cp.plan_handle AS PlanHandle,

                stmt.value('(@ParameterizedPlanHandle)',

                           'varchar(64)') AS ParameterizedPlanHandle,

                qp.query_plan AS [Execution Plan]

    FROM        sys.dm_exec_cached_plans AS cp

    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp

    CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)

    WHERE       qp.dbid = DB_ID()

    AND         st.[text] NOT LIKE '%ExcludeMe%';

    go

     

    At first sight, this may appear to be a very simple and elegant solution for the problem. Just set all databases to forced parameterization and then sit back and relax, waiting for the inevitable flood of happy emails from end users who wish to thank you for the tremendous performance boost you just gave them.

     

    But wait. When has solving a problem in SQL Server ever been this simple? There must be a catch, right? But where?

     

    A new use case

     

    In order to see the problem, I will use a different database – so let’s first clean up the mess we made in AdventureWorks2012 and restore it to its original state, so that other bloggers can use it for their demos without my stuff getting in the way:

     

    ALTER DATABASE AdventureWorks2012

    SET PARAMETERIZATION SIMPLE;

     

    DROP INDEX Sales.SalesOrderDetail.ix_ProductID_Over_1000;

    DROP INDEX Sales.SalesOrderDetail.ix_ProductID_Incl_OrderQty;

     

    A very common use case of filtered indexes is in combination with “soft deletes” – data that is no longer valid is not physically removed from the table, but kept (often to maintain a history of events). A special column, usually called “is_deleted”, is used to track that this information is historic only. Since the majority of queries is only interested in current data, you will find the predicate “is_deleted = 0” in almost every query – so that makes this an ideal candidate for a filtered index. And since we are now under the impression that filtered indexes really require forced parameterization, we will change that setting right away. Here is the script to create this scenario in a sample database (it may take some time to run this!):

     

    USE tempdb;

    go

    IF EXISTS (SELECT * FROM sys.databases WHERE name = 'FilterDemo')

    BEGIN;

        ALTER DATABASE FilterDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

        DROP DATABASE FilterDemo;

    END;

    CREATE DATABASE FilterDemo;

    ALTER DATABASE FilterDemo SET PARAMETERIZATION FORCED;

    go

    USE FilterDemo;

    go

    CREATE TABLE dbo.AllThings

       (ThingID int NOT NULL,

        ThingName varchar(20) NOT NULL,

        is_deleted bit NOT NULL DEFAULT (0),

        LotsOfStuff char(2000) DEFAULT ('Placeholder'),

        CONSTRAINT PK_AllThings

            PRIMARY KEY (ThingID)

       );

    DECLARE @i int = 0;

    WHILE @i < 5000

    BEGIN;

        SET @i += 1;

        INSERT dbo.AllThings

               (ThingID,

                ThingName,

                is_deleted)

        VALUES (@i,

                CHAR(65 + RAND() * 26) + CHAR(65 + RAND() * 26) + CHAR(65 + RAND() * 26), -- Random name

                CASE WHEN RAND() < 0.99 THEN 1 ELSE 0 END)  -- Most products are deleted

    END;

    CREATE INDEX ix_Name_NotDeleted

    ON dbo.AllThings (ThingName)

    INCLUDE (is_deleted)    -- Always include the columns you filter on!

    WHERE is_deleted = 0;

    go

     

    Once the script is done, let’s get a quick count of the number of things that have not been deleted. I used a formula that results in 99% of the things being deleted, but there is a random factor involved. The actual number of non-deleted things should be close to 50. It was 58 on my system, which I found by running this query:

     

    SET STATISTICS IO ON;

    go

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = 0;

     

    But if you switch to the messages tab, you will see a very disturbing number. I expect SQL Server to execute this query by simply scanning the filtered index, as this index exactly contains all the rows it needs to count. But the output from STATISTICS IO shows that a total of 1256 logical reads have been made. Over twelve hundred reads to count just 58 rows? How is that possible? Let’s take a look at the execution plan to find out:

    image

    As you see, the filtered index is not used at all; the optimizer chose to scan the clustered index instead, wading through all 5,000 “things” in my table to find just those 58 that were not deleted.

     

    The reason for this is simple. I have enabled forced parameterization. So I told SQL Server that, no matter the consequences, it should always replace constant values with parameters. So the plan that the optimizer was forced to compile was not for the query I typed, but for this query instead:

     

    DECLARE @1 int = 0;

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = @1;

     

    And the optimizer has to produce a plan that will always return the correct results, for any possible value of the parameter. Of course, when executing this query with parameter value 1, scanning the clustered index is the only possible way to return correct results, so the plan choice that was made was indeed the only possible choice. By enabling forced parameterization, we have effectively crippled the optimizer in using any filtered index at all (except through views).

     

    More force?

     

    In a case such as this, where we know that we will always use is_deleted = 0 in the query predicate, it can become very tempting to find ways to convince the optimizer to choose the query plan we want without giving up on the forced parameterization plan. Let’s first see what we can achieve by parameterizing this query ourselves and applying the OPTIMIZE FOR hint:

     

    DECLARE @1 int = 0;

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = @1

    OPTION (OPTIMIZE FOR (@1 = 0));

     

    Okay, I admit, it was a rather desperate attempt and I didn’t really expect much of it. The OPTIMIZE FOR hint tells the optimizer that I want a plan that gives the best performance for that particular value, but the results still have to be correct for other values. So the optimizer will still consider what would happen if I supply other values, and will reject the filtered index because of that.

     

    So, back to the original query, and now use more force. I know that this query will always perform better when using the filtered index – so if the optimizer fails to see that, I will just force it. Applying an index hint does just that. It may be called a hint, but it is a directive; if you hint a query, it WILL be used no matter what. So this should help, right?

     

    SELECT COUNT(*)

    FROM   dbo.AllThings WITH (INDEX = ix_Name_NotDeleted)

    WHERE  is_deleted = 0;

     

    No, it does not help. The result is an error message. We have now given the optimizer a totally impossible task. We told it to parameterize the query, no matter what, so it did. It now has to produce an execution plan for the parameterized query, and that query will be reused with different values. You and I know that the value will really never be different, but SQL Server does not, and it still has to guarantee correct results. But then we also told SQL Server that it really has to use an index that does not include some of the rows that, for some values of the parameter, may have to be returned. Like I said – an impossible task, and SQL Server responds just like you and I would do, by throwing up its hands in despair and giving up.

     

    The simple solution

     

    Luckily, there is a solution to this problem. It’s simple – simple parameterization, to be precise. Just set the parameterization option back to its default setting of simple, and you will get a much better behavior.

     

    ALTER DATABASE FilterDemo

    SET PARAMETERIZATION SIMPLE;

    SET STATISTICS IO ON;

    go

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = 0;

     

    Now the query takes just two logical reads. And the execution plan looks as expected: a scan of the filtered index, that’s all.

     

    Conclusion

     

    When you look only at the plan cache, forced parameterization may look like manna from heaven. But when you look further, you will see that setting the parameterization option to forced is probably not a good idea at all. It may appear to solve some issues, but you get bigger issues in return – filtered indexes that might boost performance tremendously are ignored, and if they are hinted it can even cause errors. Do you know all the code that is running on your system? Are you sure that none of your application developers has ever added an index hint? Do you want to find out the hard way?

     

    Simple parameterization in combination with filtered indexes may not always play well with the plan cache. But I would think very long and hard, and do a close inspection of all objects in the database and all queries used before even considering switching to forced parameterization.

  • Parameterization and filtered indexes (part 1)

    Parameterization is a process where SQL Server slightly modifies a query in the parse phase: it replaces constant values by parameters. So if you submit

    -- Query 1

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    the parser will replace this by (and fool the optimizer into thinking you submitted):

    DECLARE @1 smallint = 706;

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = @1;

    You can verify this very easily. If you go to the actual execution plan and hover over the left-most icon (the SELECT), a popup window will open that shows (a.o.) the query that was fed into the optimizer – but beware, this can be misleading, as will be demonstrated later in this post. I will show a more reliable method shortly. Don’t forget to disable the “Include Actual Execution Plan” option now, as it results in some extra work being done that will influence the output of the queries below.

     

    The benefit of this is that a second execution of the same query with a different value will reuse the same plan. This saves compilation time, and less procedure cache space is consumed for these queries. However, you do run the risk of getting bad plans because of parameter sniffing with a bad value. Because that risk increases with query complexity, SQL Server’s default behavior is to only parameterize very simple queries (“simple parameterization”), but you can opt to use “forced parameterization” instead, by setting the database option PARAMETERIZATION FORCED.

     

    Parameterization in action

     

    Before adding filtered indexes to the mix, let’s start with some experiments. First, verify that you are on your development server and not accidentally logged in to a production box. Then run the query below. The first four lines disable the “Optimize for ad hoc workloads” options, a server-wide option that should probably be enabled on every production box, but is not as relevant for development – and would complicate this blog post. (The same principles apply, it is just a bit harder to demonstrate). The fifth line instantly clears out the execution cache, a very bad thing to do in production, but great for experiments such as this.

    -- Query 2

    EXEC sp_configure 'show advanced options', 1;

    RECONFIGURE;

    EXEC sp_configure 'optimize for ad hoc workloads', 0;

    RECONFIGURE;

    DBCC FREEPROCCACHE;

     

    Let’s take a look at the contents of the plan cache – it should be empty, so ideally this query will return an empty result set.

    -- Query 3

    WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT      cp.objtype AS ObjectType,

                cp.usecounts AS UseCount,

                st.[text] AS QueryText,

                cp.plan_handle AS PlanHandle,

                stmt.value('(@ParameterizedPlanHandle)',

                           'varchar(64)') AS ParameterizedPlanHandle,

                qp.query_plan AS [Execution Plan]

    FROM        sys.dm_exec_cached_plans AS cp

    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp

    CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple')

                                                 AS batch(stmt)

    WHERE       qp.dbid = DB_ID()

    AND         st.[text] NOT LIKE '%ExcludeMe%';

    The filter on “ExcludeMe” is a trick to ensure that the entry for the query itself does not show up. Other unrelated queries can also pop up, even when you are on a development system that nobody else is using – internal management tasks and installed tools (such as intellisense or third party database management tools) may fire queries that will all show up. Most of these should be excluded because of the filter on DB_ID (but do note that as a consequence, you MUST run this query in the same database where you are running your experiments – which for this blog post will be AdventureWorks). If the remaining unrelated queries bother you, find a part of the query text that identifies those queries and that never occurs in your own queries and add an extra filter on st.[text] to exclude those queries as well.

     

    Now rerun query 1 above, then inspect the plan cache again (using query 3), and you should see a result similar to this:

    image

    The “Adhoc” plan is not a real execution plan. It is a placeholder that links the original text of the query before parameterization to the actual plan, called a “Prepared” plan. Check the values in the PlanHandle and ParameterizedPlanHandle columns to see this. Or click the content of the Execution Plan column to see a graphical representation of each plan. If you submit Query 1 a few more times, the UseCount of the Adhoc plan goes up, because in case of an exact match between the text of a submitted query and the text of a query in the plan cache, the same plan will be reused. The UseCount of the Prepared plan does not go up, even though this is the plan that is actually executed – a weird way of counting, but documented (though not entirely correct).

     

    To see parameterization in action, let’s count orders for a different product:

    -- Query 4

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 710;

    Go back to query 3 and run it to see what this has done to the plan cache. Here are the results I got:

    image

    The raw text of the query is different, so a new “Adhoc” entry has been made. But after parameterizing the query text, SQL Server found that it already had a plan, so it could skip the compilation process and start executing right away. This saves lots of CPU cycles for the (relatively expensive) compilation process. It also reduces plan cache bloat, because the fake plan for the unparameterized query takes up a lot less space than a full execution plan (add the column cp.size_in_bytes to the SELECT list in query 3 if you want to see how much it saves). As you can see, the UseCount of the parameterized plan does go up this time, which conflicts with the remark in Books Online that it is “not incremented when parameterized queries find a plan in the cache”. I guess it actually represents the number of distinct unparameterized plans that have been linked to this parameterized plan.

     

    Assuming you use the default of simple parameterization, a more complicated query should not use this mechanism. We can verify this using the same method. After clearing the plan cache (using the last line of query 2 above), I added some needless complexity to the previous query:

    -- Query 5

    SELECT     COUNT(*)

    FROM       Sales.SalesOrderDetail AS sd

    INNER JOIN Sales.SalesOrderHeader AS sh

          ON   sh.SalesOrderID = sd.SalesOrderID

    WHERE      sd.ProductID = 710;

    The join to SalesOrderHeader does nothing, and will even be eliminated by the query optimizer; the actual execution plan is still the same as before. But if you now run query 3 to inspect the plan cache contents, you will get just a single row in the result:

    image

    In this case, there is just an “Adhoc” plan that contains the original query text, including the hard-coded product ID value. It does not reference a different, parameterized plan; and when you click the Execution Plan column, you will indeed see a real execution plan. Change the product ID in query 5 back to 706 and execute it, and a new row will be added to the plan cache – again containing a full plan. So in this case, the optimizer had to do all the work to produce a plan. Imagine hundreds of queries like this on a busy production system, and you can see how all those compilation could impact overall performance, and bloat the plan cache. Using forced parameterization changes this, but at the risk of potentially introducing parameter sniffing issues – so pick your poison.

     

    As you can see, parameterization works as designed, and it can be quite useful. Depending on your workload, you might want to consider enabling forced parameterization. But that is not the subject of this blog post!

     

    Filtered indexes

     

    The feature “filtered indexes” was first introduced in SQL Server 2008. This feature can be very useful, despite its known limitations (hint: if you filter on any columns not included in the index, always INCLUDE them – that solves 95% of the problems). But it has an effect on parameterization and plan reuse that many people are not aware of. To illustrate this, let’s just create a filtered index that should not affect my demo queries at all – an index that includes only products with a much higher product number.

    -- Query 6

    CREATE INDEX ix_ProductID_Over_1000

    ON Sales.SalesOrderDetail (ProductID)

    WHERE ProductID > 1000;

    Run this query to create the index, then clear the plan cache, activate the option to include the actual execution plan and then run query 1 again. Looking at the plan, you might think that nothing changed – just as before, the value 706 has been replaced by the placeholder @1 in the “Statement” property of the “SELECT” iterator. But as I said, this is not always a reliable source of information – let’s get a second opinion from the plan cache by using query 3 again. Here is the result:

    image

    The two STATISTICS XML queries are the result of including the actual execution plan (if you don’t believe me, clear the procedure cache, then rerun query 1 and you will see only that single line). Just a single Adhoc plan, no ParameterizedPlanHandle, no Prepared plan – the picture is clear, the plan was not parameterized. The information in the execution plan is misleading

     

    So how can a filtered index that would never have been used for this query have this effect on parameterization? The explanation lies of course in the existence of a filtered index that has the ProductID column in its predicate – the same column that is also used in the predicate of the query. Because of the different values, the filtered index may look irrelevant to this query. But if the parameterization were successful, the same plan would ALSO be used for this query:

    -- Query 7

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 1706;

    After parameterization, this query looks the same as query 1. If you look at the actual execution plan of this query, you will see that this query not only CAN, but also WILL use the filtered index we created earlier. If SQL Server had allowed parameterization to take place, this option would not have been available. To understand the misleading information in the actual execution plan, we have to understand the actual order of events, and looking at the right performance counter can help here (thanks to Alejandro Mesa, who helped me understand ). Let’s clear out that procedure cache again, and now run the query below – it is the same as query 7, but with two copies of the same query before and after it.

    -- Query 8

    SELECT cntr_value

    FROM   sys.dm_os_performance_counters

    WHERE  [object_name] LIKE '%:SQL Statistics%'

    AND    counter_name = 'Unsafe Auto-Params/sec';

    go

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    go

    SELECT cntr_value

    FROM   sys.dm_os_performance_counters

    WHERE  [object_name] LIKE '%:SQL Statistics%'

    AND    counter_name = 'Unsafe Auto-Params/sec';

    go

    If you follow along, you will see that the value of this performance counter goes up by one. This is an artifact of a chain of events that starts with one component (I assume the parser – I guess someone like Paul White would probably know, or immediately attach a debugger to find out) deciding to parameterize the query – this is the version of the plan you see in the graphical execution plan. The next component (probably the optimizer) notices that the plan is not stable – meaning that different values of the constant can result in different plans being optimal. That makes it unsafe for simple parameterization, so the parameterized version is rejected, and the original, non-parameterized version is compiled, stored in the plan cache, and executed instead.

     

    Unfortunately, this mechanism of rejecting unsafe parameterizations may get a bit too enthusiastic. Let’s first create one more index – this time not indexed, but with an included column:

    -- Query 9

    CREATE INDEX ix_ProductID_Incl_OrderQty

    ON Sales.SalesOrderDetail (ProductID)

    INCLUDE (OrderQty);

    If I now want to find the total ordered quantity instead of just the number of rows for a product, I would modify my original query 1 as follows:

    -- Query 10

    SELECT SUM(OrderQty)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    This query will never use the filtered index. Change the value 706 to something above 1000 and check the execution plan if you don’t believe me. Then use a hint to force the filtered index and compare the new execution plan with the old one to see the reason. The new index includes the OrderQty column, so it covers this query. The filtered index does not include this column, so a lookup into the clustered index has to be added to the plan, making it much more expensive (about twice the cost, according to the optimizer’s estimates). For this particular query, the filtered index is totally irrelevant – so now the plan is stable again, and there is no need to reject the parameterization attempt, right? Well, you can use any of the methods outlined above (clearing the cache, running the query, then looking at the cache contents; or watching the “Unsafe Auto-Params/sec” when submitting the query) and you will see that the parameterization is still rejected. The decision to reject or accept a parameterized query is not made after a full optimization, but after just a quick glance at the start of the optimization process. By the time the optimizer has explored enough options to realize that there is a single plan that is optimal for all values, even those included in the filtered index, the decision has already been made and it cannot be undone.

     

    Conclusion


    If you have issues with a bloated plan cache or too much CPU time going to compiling queries, check for filtered indexes. Their habit of getting in the way of successfully parameterizing queries may be a root cause of your problems.

     

    But that’s not the whole story! In the next part, I will show you how to convince SQL Server to change the default behavior of not parameterizing queries if a filtered index is present – but I will also demonstrate how this can result in even worse performance, or even runtime errors!
  • Understanding Execution Plans

    To me, it feels as if 2014 is a long time away. But it isn’t. Sinterklaas has already dropped off his presents and is probably already back in his castle in Spain. Christmas is almost upon us. And before we know it, we’ll be washing oliebollen down with Champagne.

    That also means that I need to get cracking on preparing my precon seminar on execution plans. What precon seminar you say? Oh right – I didn’t tell you yet. The good folks organizing SQL Saturday #269 in Exeter (UK) (on March 22) have decided to extend the event with a full training day on Friday March 21. For that purpose, they invited eight of the best SQL Server speakers in the world, and me, to spend a whole day pouring our knowledge into other people’s heads.

    The full program for this excellent day can be found here – and believe me, if I was not speaking that day, I would be trying to clone myself and visit at least four of them!

    My session on that day, as you can see, focuses on execution plans. So what can you expect if you sign up? Here is a short overview.

    I will start with the bare basics. What is an execution plan, what is its role in a SQL Server database, and most of all – how can you get to see it, how do you start reading it (Left to right? Right to left?), and how do you get to those extremely interesting details that Microsoft has carefully hidden from view? Why is an estimated plan not estimated, why is an actual plan actually mostly an estimated plan? What is an iterator, what does it do, and how does it interact with other operators?

    After that, we’ll dive into the details. What is a seek, what is a scan? How do you notice when a seek secretly scans, and why should you care? Why do people say scans are bad, and why are they wrong? (Or, maybe even more important, WHEN are they wrong, and when not?)

    Next up are the joins. Inner, outer, loop, merge, hash – but also semi, and anti semi. Yes, those are all join types, and we will explore them all. How do they work, what are the benefits and what are the costs of all of them? How can you influence the type of join chosen, and why would you want to?

    All that, and much more, will be included in my seminar. And that’s just what I have planned for before lunch. At the end of the day, you can expect to have a good working knowledge of almost every iterator that you can encounter in an execution plan. I will also explain what has changed in execution plans in SQL Server 2012 and SQL Server 2014. And to ensure that nobody falls asleep from my constant talking, I will give you all some exercises in between, challenging you to immediately apply what you just learned. These exercises may not be the kind of exercises you expect - but trust me, you will find them to be fun and refreshing, and they will also learn you how to apply your knowledge to all kinds of problems.

    Does this sounds interesting or useful at all to you? I hope so – why else are you on a SQL Server blog site? So get cracking – click this link and sign up for my precon right now. (Or for one of the other precons – they are all great!) If you do so soon enough, you can still apply for the special “Early Bird” rate of only £150 – which is valid until December 15. But don’t despair if you see this post later – from December 16 until February 28, you still only pay £185 – still a steal for a whole day of training! (Even the £200 last minute rate that applies from March 1st is great value – but honestly, why wait?)

    I am looking forward to visiting Exeter this March. I hope to see all of you there. Either in my precon – or if you choose to attend one of the other precons, then maybe in one of the many sessions that will take place the next day.

  • SQLRally and SQLRally - Session material

    I had a great week last week. First at SQLRally Nordic, in Stockholm, where I presented a session on how improvements to the OVER clause can help you simplify queries in SQL Server 2012 enormously. And then I continued straight on into SQLRally Amsterdam, where I delivered a session on the performance implications of using user-defined functions in T-SQL.

    I understand that both events will make my slides and demo code downloadable from their website, but this may take a while. So those who do not want to wait can download the material from this blog post.

    Both SQLRally events have recorded all their sessions. It will obviously take a while to edit and publish all those recordings – but those who missed my session and want to check it out with my explanations know that if they wait a while, they can watch the recording online on the SQLRally websites.

    I once more would like to thank all volunteers who organized these events, all the sponsors who helped fund them … and most of all, all attendees who made my time in Stockholm and Amsterdam amazing. You were a great crowd, both during my session and in the many chats I had during the breaks.

  • Decks and demos – Session material for Silicon Valley Code Camp

    This weekend, I will be presenting two sessions at Silicon Valley Code Camp, in Los Altos Hills, CA. On Saturday, I will have an early start – the first time slot of the day, at 9:45 AM, I will present on how T-SQL user-defined functions can easily wreck your performance – and how you can prevent that.

    On Sunday afternoon (1:15 PM), I will then present a session on the OVER clause, focusing on how both the SQLL Server 2005 version and the enhanced SQL Server 2012 syntax of this feature can help you solve common problems without having to resort to ill-performing and unmaintainable monster queries.

    Both sessions are quite demo-heavy, so I hope a lot of attendees will download the demo code and play with it for themselves. I have therefor attached the demo code for these sessions to this post. Oh, and the slide deck is included as well.

  • Book review: SQL Server Transaction Log Management

    It was an offer I could not resist. I was promised a free copy of one of the newest books from Red Gate Books, SQL Server Transaction Log Management (by Tony Davis and Gail Shaw), with the caveat that I should write a review after reading it. Mind you, not a commercial, “make sure we sell more copies” kind of review, but a review of my actual thoughts. Yes, I got explicit permission to be my usual brutally honest self.

    A total win/win for me! First, I get a free book – and free is always good, right? And second, I knew that I would get my pleasure out of it – either the pleasure of having a great read if it’s good, or the somewhat more perverse pleasure of bashing the book after struggling through it if it’s not!

    For all the tl;dr types out there: this review will be mostly positive. I found no reason to give the book a bashing. And for those with an attention span beyond three paragraphs, here’s why.

    Things I like about the book

    The authors explicitly state that they intended the book to be lightweight and easily accessible. They clearly intended the book to be a good primer for the starting DBA who wants to know a bit more about how to manage the transaction log, but is not interested in parsing the output of sys.fn_dblog.

    And they deliver as promised. The book is really an easy read. It starts with a good, basic explanation of the transaction log for DBAs who are completely new to this subject; more experienced readers can probably decide to leaf through this part without reading it all. After that, more information is stacked upon that foundation. The subjects are well organized in their own chapters. One nice touch is that the authors were not afraid to have extremely short chapters if there is not much to tell about a subject – it keeps the book well organized and information easy to find.

    Aiming to keep a book simple can be dangerous when the subject isn’t. My initial concern was that the discussion of LSNs and how Virtual Log Files are (or are not) reused luckily turned out to be unfounded – Tony and Gail apparently realized that many log management scenarios really depend on an understanding of this level, so they made the effort to try to explain this subject in an easy way, without simplifying the actual mechanisms. And they succeed very well at that!

    The book contains lots of code samples to illustrate the concepts. Those who like to read books while sitting at a computer can easily follow along, either by copying the queries, or by using the downloadable code samples. And for those who usually read in places where they have no computer available (I personally love reading in bed!), all relevant output of the code samples is included as well, so you don’t miss out by not running the code while reading.

    Finally, for those who, after reading the book, would like to have an even deeper understanding about some of the subjects, the book contains loads of references to websites where subjects are explained in more depth.

    Things I do not like about the book

    It’s not all perfect, though. I did run across a few things that bothered me. Not enough to change my overall opinion of the book, but still sufficient that I feel I need to include a few words about them in my review.

    First – the illustrations. The (otherwise really excellent) explanation of log internals, and how LSN and VLFs impact reuse of the log file, uses illustrations to make it easier to understand this complex subject matter. Unfortunately, these illustrations are very hard to read in the printed version of the book. Probably due to the illustrations being made in color, but the book being printed in black and white / grayscale. Pages 30 and 31 are prime examples of how not to use shading – the black text is impossible to read in the dark shaded areas. Another illustration, in chapter 6, does not suffer from this “dark on dark” problem, but it is weird to read references to “green” and “yellow” bars in a book that uses no color. Both the sloppy references to non-existent colors and (especially!) the undecipherable text in the illustrations of chapter 2 should really have been found and corrected during the pre-production process!

    The desire to keep the book simple has some drawbacks as well. I am not sure if all readers would agree on this, but I think some more background would have been useful for a few specific subjects. Two questions that I am left with after this book are: “so I now know that the log is used when recovering a database – but how exactly does this undo and redo process work?”, and “okay, you made it clear that too many VLFs can slow down performance – but why, what is going on under the hood, where does this delay come from?” I do happen to know the answer to the first question myself, thanks to reading some of Kalen Delaney’s excellent books – but I feel that including these backgrounds would have helped to give the reader a better understanding of how the transaction log is used.

    I also found some bits to be a bit repetitive. Maybe this was a conscious decision, to really drive home a point? Or to facilitate the readers who only read specific bits of the book instead of reading it cover to cover as I did? Or maybe it was just an oversight, or an attempt to fill the book after running out of new stuff to discuss? Anyway, on me it had the effect of making me go “Yeah, no need to repeat that yet another time, I get it now” a few times while reading.

    I already said that I liked all the links to further reading. But URL shortening services like bit.ly or tinyurl exist for more reasons than just the twitter 140-character limit. I cannot click on a link in a hardcopy book, and I do not like having to manually copy a link that spans two lines in print. A missed opportunity! Also, while most links were completely optional to follow only if I wanted broader or deeper insights, there were (I think) one or two occasions where I felt that following the link was required for better understanding of the book itself – which given my habit of reading in bed was a problem. In those cases, the information should really have been included in the book.

    Finally, the last chapter was a slight disappointment to me. A lot of text was spent to explain how to use and setup the various monitoring tools, which I think is a bit too far off-topic for the book. And then, after showing the reader how to set up monitoring, the failed to answer the question that I am sure haunts many incidental DBAs: “In these monitoring results, what should I look for?” Experts have very good reasons for not wanting to set specific values and thresholds in their advice, but they forget that beginners have equally good reasons for needing those values. Some more specific guidelines, with the caveat that they are not universal and not guaranteed to be still valid in future versions of SQL Server, would have made this chapter much more useful.

    The verdict

    If you are already aware of how to allocate the transaction log, how recovery models, backups, restores, and log growth interact, and how to monitor the log; if you are looking for an in-depth discussion of transaction log internals – this is NOT the book for you. That is not a statement about the quality of the book; you are simply not the intended audience.

    But for the beginning or occasional DBAs, as well as the more experienced DBAs who feel a need to deepen their understanding of the transaction log beyond the knowledge that “the bloody thing needs to be backed up or it’ll eat up my entire drive, ‘coz those Redmond guys say so”, this is an excellent read. And even if you have passed that level a long time ago, if you feel comfortable scheduling log backups, switching between recovery models, and doing a point-in-time restore – you’ll probably still learn some new stuff from reading this book.

    All in all, I consider this book a recommended read. You can order it through amazon.com or through amazon.co.uk – or if you like free as much as I do, you can even download a free PDF copy (and optionally get a free evaluation copy of Red Gate’s SQL Backup Pro thrown in!)

  • Jetzt geht’s los - speaking in Germany!

    It feels just like yesterday that I went to Munich for the very first German edition of SQL Saturday – and it was a great event. An agenda that was packed with three tracks of great sessions, and lots of fun with the organization, attendees, and other speakers.

    That was such a great time that I didn’t have to hesitate long before deciding that I wanted to repeat this event this year. Especially when I heard that it will be held in Rheinland, on July 13 – that is a distance I can travel by car! The only potential problem was the timing (school holiday for my kids), but as soon as I had that sorted out, I submitted my abstracts.

    I am delighted to say that I have been selected to speak again. Do take coffee if you plan to go to my session – it’s the first of the day, and I’ll dive right in with a deep, 500-level session. In this session, I will give my audience a peek under the hood of columnstore indexes and batch mode processing in SQL Server 2012 – two fantastic new features in SQL Server 2012, that can give you enormous performance benefits – but very little is documented about how exactly they work. I did a lot of digging to find the bits that are documented, pieced everything together, and used common sense and logic thinking to fill in the blanks. The result is a session that discloses a lot of hard-to-find and undocumented internals of columnstore indexes and batch mode processing. If you’re a geek, and you think you can handle a 500-level session at 9:15 in the morning – come visit my session!

    But wait (to quote late night teleshopping TV), there is more! The volunteers who work incredibly hard to organize this event have decided that they will also host a precon this year. So if you can manage to be in Rheinland by Friday July 12, you can get in a full day of learning for the low fee of only € 179,= – a fraction of what you pay for similar training days at most other conferences! And you can even pick from a selection of three great choices.

    If you’re into BI, you can learn about Data Analytics and BigData from Ruben Pertusa Lopez and Paco Gonzalez. If you are more a DBA person, consider subscribing to the training day on Extended Events, presented by Andreas Wolter and my good friend Mladen Prajdic. But if you ever are in a situation where knowing a bit about database design would help (and let’s be honest – who in this profession is never in such a situation?), you should really consider subscribing to my seminar on effective database design.

    If you choose to attend my precon, you will learn a database design method that doesn’t focus on functional dependencies and normal forms. Instead, I will learn you how to discuss design questions with your interview partners in a language they understand – even if you don’t! Sounds impossible? Trust me, it isn’t. I’ll show you the steps to follow to make sure you ask the right questions and draw the correct conclusions from the answer. In the end, you will have a database design that is completely correct for the business, and perfectly normalized to boot.

    Interested? Great! You can sign up here. I’m looking forward to meeting you in Rheinland!

  • Why does SQL Server not compress data on LOB pages?

    Enabling compression on your database can save you a lot of space – but when you have a lot of varchar(max) or nvarchar(max) data, you may find the savings to be limited. This is because only data stored on the data and index pages is compressed, and data for the (max) data types is generally stored on other, special-purpose pages – either text/image pages, or row overflow data pages. (See Understanding Pages and Extents in Books Online). This is from the SQL Server 2008R2 Books Online, but it is still valid in SQL Server 2012 – but apparently, this page has been removed from newer Books Online editions).

    So why does SQL Server not compress the data that, perhaps, would benefit most from compression? Here’s the answer.

    SQL Server currently supports two compression methods for data in the database (backup compression is out of scope for this post).

    * Row compression: This is a simple algorithm to save storage space for individual rows. It has two elements. The first is a more efficient way to store the per-row metadata, saving a few bytes per row regardless of layout and content. The second element is storing almost all data types, even those that have a fixed length, as variable length. This mainly has benefits for the larger numerical types (e.g a bigint with a value of 1,000 is stored in two bytes instead of eight – only values that actually need all eight bytes do not gain from this, and will instead take up more space because the actual length has to be stored somewhere) and for fixed-length string types with lots of trailing spaces. For Unicode data, the SCSU algorithm is used, which saves 15% to 50% depending on the actual content of the column. (According to Wikipedia, the SCSU standard has gained very little adoption because it is not as effective as other compression schemes).

    See Row Compression Implementation and Unicode Compression Implementation in Books Online.

    * Page compression: When enabled, page compression is done *after* row compression. As the name implies, it's done on a per-page basis. It consists of two steps:

    1. Prefix compression. Within each column, the longest common prefix is used to build the "anchor record". All columns than only indicate how many characters of the anchor value they use as prefix. So for example, if we have a first name column with the values Roger / Hugo / Hugh, the anchor value could be Hugh, and the data values would be stored as {0}Roger / {3}o / {4}. (Here, {3} is stored as a single byte, and {3}o means: first three characters of Hugh, followed by an o).

    2. Dictionary compression. Accross the entire page, columns that are now stored with the same bit pattern are replaced with a single value that points to the dictionary entry. Let's assume that the same page I use above also has a Lastname column, with values Plowman / Kornelis / Ploo. Here, Plowman would be the anchor value, and the data after prefix compression would be {7} / {0}Kornelis / {3}o. The dictionary encoding would then see that there is a {3}o in the population of the Firstname columnm and a {3}o in the population of the Lastname column. It would place {3}o as the first entry in the dictionary and replace both {3}o values with the reference [1].

    See Page Compression Implementation in Books Online.

    All elements of page compression save space by eliminating repeated data between different column values, so they will only work when multiple values are stored on a page. For all LOB pages, the reverse is the case: a single value spans multiple pages. So by definition, page compression can never yield any benefits.

    For row compression, the more efficient storage of per-row metadata naturally only affects pages that have per-row metadata stored – data and index pages, but not LOB pages. And the conversion of fixed length to variable length data types also doesn’t affect LOB pages, since these can only be used for varying length data.

    Based on the above, it is obvious why SQL Server does not compress varchar and varbinary data stored on LOB pages – there would be zero benefit from any of the implemented compression methods. But how about Unicode compression for nvarchar(max) and overflowing nvarchar(nnn) data? Wouldn’t that save some space?

    To answer that, I now have to go into speculation mode. And I see two possible theories:

    1. Because the SCSU standard saves less spacing than other algorithms, the SQL Server team deliberately made this choice in order to encourage people to compress these large values in the client before sending them to the server, thereby reducing not only storage space (by more than SCSU would have yielded), but also network traffic. The down side of this is that cool features such as Full-Text Search and Semantic Search don’t understand data that was compressed at the client – at least not without a lot of extra effort.

    2. Since all compression algorithms work on a per-page basis, they had a choice between either first breaking the LOB data into pages and then compressing (which makes no sense, as the rest of the page would remain empty and the amount of space actually used remains the same) or creating a separate algorithm for LOB data to first compress it and then split it over multiple pages. That would of course have cost a lot of extra engineering hours, and if my understanding of SCSU is correct, it would also have a big adverse side effect on operations that affect only a part of an nvarchar(max) value (like SUBSTRING or the .WRITE method of the UPDATE statement). That is because SCSU works by traversing the entire string from left to right and can’t handle operating on only a subset of the string.

    Bottom line: When you have to store large values and you want to save on storage size, your best course of action is probably to compress and decompress the values on the client side. But do beware the consequences this has for Full Text Search and Semantic Search!

    Final note: I didn’t spend as much time on this blog post as I normally do. That’s because this actually started as a reply to a question on an internet forum, but when I was busy I realized that the reply was long enough to be promoted to a blog post.

  • Bleeding Edge 2012 – session material

    As promised, here are the slide deck and demo code I used for my presentation at the Bleeding Edge 2012 conference in Laško, Slovenia. Okay, I promised to have them up by Tuesday or Wednesday at worst, and it is now Saturday – my apologies for the delay.

    Thanks again to all the attendees of my session. I hope you enjoyed it, and if you have any question then please don’t hesitate to get in touch with me.

    I had a great time in Slovenia, both during the event and in the after hours. Even if everything the tour guide said during the tour of the Laško brewery was lost on me (in his defense, he did offer to translate the Slovenian explanations to Russian), I still liked it – especially the part where we got to sample some of the produce!

    I truly hope that there will be another Bleeding Edge conference next year. And if there is, I definitely want to speak there again!

  • SQLRally Nordic 2012 – session material

    As some of you might know, I have been to SQLRally Nordic 2012 in Copenhagen earlier this week. I was able to attend many interesting sessions, I had a great time catching up with old friends and meeting new people, and I was allowed to present a session myself.

    I understand that the PowerPoint slides and demo code I used in my session will be made available through the SQLRally website – but I don’t know how long it will take the probably very busy volunteers to do so. And I promised my attendees to make them available through my blog as well.

    So, here they are, for everyone to download, see, and play with. Though, in all honesty, I expect people who were not present at my session to get very little out of this material.

    (I’m not sure, but I *think* that the SQLRally organization will also make a recording of my session available. If that is true, then once it’s up the slides and demo code should make a lot more sense to anyone who is able to view the recording).

    I know I’ve said it before, but I can’t repeat this often enough – thanks to everyone involved with the organization of SQLRally for creating a truly wonderful event, and thanks to all the attendees who took the time and energy to come to my session and listen to me.

  • T-SQL User-Defined Functions: the good, the bad, and the ugly (part 4)

    Scalar user-defined functions are bad for performance. I already showed that for T-SQL scalar user-defined functions without and with data access, and for most CLR scalar user-defined functions without data access, and in this blog post I will show that CLR scalar user-defined functions with data access fit into that picture.

     

    First attempt

     

    Sticking to my simplistic example of finding the triple of an integer value by reading it from a pre-populated lookup table and following the standard recommendations and templates that Microsoft publishes for CLR scalar table-valued functions, my first attempt for this function looked like this:

     

    [Microsoft.SqlServer.Server.SqlFunction(DataAccess=DataAccessKind.Read,

        IsDeterministic=true,SystemDataAccess=SystemDataAccessKind.None,IsPrecise=true)]

    public static SqlInt32 CLRTripFromTbl1(SqlInt32 Value)

    {

        SqlInt32 Result;

        using (SqlConnection conn = new SqlConnection("context connection=true"))

        {

            conn.Open();

     

            SqlCommand cmd = new SqlCommand("SELECT Triple FROM dbo.Triples WHERE Value = @Value;", conn);

            cmd.Parameters.Add(new SqlParameter("Value", Value));

     

            Result = (SqlInt32)((int)cmd.ExecuteScalar());

        }

     

        return Result;

    }

     

    Below is the code I used to test the performance of this version in comparison to the T-SQL scalar user-defined function and the version without UDF I used earlier (in part 2 – where I also explain why the query without UDF is more complex than needed). Note that I added a WHERE clause that limits the test to just 10% of the table (one million rows instead of all ten million) to limit the testing time.

     

    SET STATISTICS TIME ON;

     

    -- T-SQL UDF

    SELECT MAX(dbo.TripFromTbl(DataVal)) AS MaxTriple

    FROM   dbo.LargeTable

    WHERE  KeyVal <= 1000000;

     

    -- CLR UDF

    SELECT MAX(dbo.CLRTripFromTbl1(DataVal)) AS MaxTriple

    FROM   dbo.LargeTable

    WHERE  KeyVal <= 1000000;

     

    -- No UDF

    WITH  TheTrips

    AS   (SELECT (SELECT t.Triple

                  FROM   dbo.Triples AS t

                  WHERE  t.Value = l.DataVal) AS TheTrip

           FROM   dbo.LargeTable AS l

           WHERE  l.KeyVal <= 1000000)

    SELECT MAX(TheTrip) AS MaxTriple

    FROM   TheTrips;

     

    SET STATISTICS TIME OFF;

     

    If you run this and check the CPU and elapsed time, you’ll find a huge (and very nasty) surprise. I expected the CLR version to be about as bad as the T-SQL user-defined function with data access, but I was wrong – it is more than ten times slower! For processing just these one million rows (with the data already in cache), the version without UDF took about 0.2 seconds elapsed, 0.8 seconds CPU (the CPU time being higher than the elapsed time is because parallelism); the version with T-SQL UDF took about 26 seconds (elapsed and CPU – no parallelism in this execution plan) – and the CLR version took over 350 seconds! This is also the reason why I added the WHERE clause; without it, the CLR version would probably have taken more than half an hour to finish.

     

    For completeness sake, I also checked the execution plan and the SET STATISTICS IO output of the CLR versions. All the issues I saw with the T-SQL scalar user-defined function with data access plague the CLR versions as well – no indication at all of the cost of the function in the execution plan, and no report of the amount of I/O in the SET STATISTICS IO output (though it is still visible in a profiler trace). Also, like the T-SQL scalar function with or without data access (but unlike the CLR scalar function without data access), you will never get a parallel plan on a query that invokes a CLR scalar user-defined function with data access. And, unlike a T-SQL UDF with data access, even the estimated execution plan will not show any hint of the data access performed in the function.

     

    The parallel loopback misunderstanding

     

    When I discussed these results with some people I consider more knowledgeable on CLR than myself, someone told me that by using a loopback connection instead of the context connection, it is possible to forgo the DataAccessKind.Read switch, which impedes parallelism.

     

    Well – if I could give a price for the worst advice ever (okay, it later turned out to be the most misunderstood advice ever – more about that later), this would definitely qualify. I created and tested this UDF:

     

    [Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read,

    IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]

    public static SqlInt32 CLRTripFromTbl2(SqlInt32 Value)

    {

        SqlInt32 Result;

        using (SqlConnection conn = new SqlConnection(@"Data Source=perFact\SQL2012; Initial Catalog=udf; Integrated Security=True;"))

        {

            conn.Open();

     

            SqlCommand cmd = new SqlCommand("SELECT Triple FROM dbo.Triples WHERE Value = @Value;", conn);

            cmd.Parameters.Add(new SqlParameter("Value", Value));

     

            Result = (SqlInt32)((int)cmd.ExecuteScalar());

        }

     

        return Result;

    }

     

    You may note that the DataAccessKind.Read switch is still present. That’s because, when testing this with DataAccessKind.None, I got a run-time error. After restoring the DataAccessKind.Read switch, I at least got this code to run – but not very fast! When testing this UDF, I had to reduce the amount of rows to process even further – processing just 1% of the table (a hundred thousand rows) took 168 seconds elapsed (though “only” 54 seconds CPU), so the one million row test I used earlier would have taken almost half an hour, and if I had used the full testset, the test would have run for almost 5 hours! That makes this test with the loopback connection almost five times slower than the one with the context connection – which was already about ten times slower than the T-SQL UDF (which, in turn, was well over a hundred times slower than the version without UDF).

     

    It later turned out that I had misunderstood the advice. For the record, this advice was given by Adam Machanic, who is a true crack at everything CLR. It was not his advice that was bad; it was my ability to understand what he meant. He later clarified that his suggestion was to move the data access to a separate thread. He even pointed me to the library he created for this – see this link.

    I have to be honest. I looked at the code. And looked at it again. Then I looked at the sample code. And again. And yet another time. And then I admitted defeat. My understanding of CLR is simply too limited to enable me to adapt me “triples” example to use this technique, so I will not be able to include it in performance comparisons. If anyone is willing to give it a try, then by all means do – and please let me know the results!

     

    Cache as cache can

     

    Another tip, also from Adam (and this time not misunderstood by me!), is to avoid multiple lookups of the same value by implementing a cache in the CLR code. This is very easy to do for anyone with a fair bit of CLR skills – and I think that I have proven that you can even replace those CLR skills with a judicious amount of Google (or Bing). Here is how my code (after several failed attempts) eventually looked:

     

    // This is the initial size of the cache; it will grow when needed.

    private const int InitialSize = 40;

    public static TripleCache tc = new TripleCache(InitialSize);

     

    [Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.Read,

    IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]

    public static SqlInt32 CLRTripFromTbl3(SqlInt32 Value)

    {

        SqlInt32 Result = tc[Value].Value;

        return Result;

    }

     

    public class TripleCache

    {

        // Dictionary to contain the cache.

        static Dictionary<SqlInt32, WeakReference> _cache;

     

        public TripleCache(int count)

        {

            _cache = new Dictionary<SqlInt32, WeakReference>();

        }

     

        // Access a data object from the cache.

        public Triple this[SqlInt32 index]

        {

            get

            {

                Triple tr;

                // Try to read the triple from the cache.

                try

                {

                    tr = _cache[index].Target as Triple;

                }

                catch

                {

                    // Triple not yet in cache; read from table and add to cache

                    tr = new Triple(index);

                    _cache.Add(index, new WeakReference(tr));

                }

                if (tr == null)

                {

                    // Triple has been in cache but was evicted - read from table again and renew cache

                    tr = new Triple(index);

                    _cache[index] = new WeakReference(tr);

                }

                return tr;

            }

        }

    }

     

     

    // This class reads the triple value from the table

    public class Triple

    {

        private SqlInt32 _triple;

     

        public Triple(SqlInt32 single)

        {

            using (SqlConnection conn = new SqlConnection("context connection=true"))

            {

                conn.Open();

     

                SqlCommand cmd = new SqlCommand("SELECT Triple FROM dbo.Triples WHERE Value = @Value;", conn);

                cmd.Parameters.Add(new SqlParameter("Value", single));

     

                _triple = (SqlInt32)((int)cmd.ExecuteScalar());

     

            }

        }

     

        // Simple property.

        public SqlInt32 Value

        {

            get

            {

                return _triple;

            }

        }

    }

     

    With the test data used so far, this method really performs very well. The test with one million rows finished in 3.5 seconds (again, both elapsed and CPU – we still do not get a parallel plan for this query). That is a lot faster than all other UDF versions tested so far – though still not even near the performance we can get by not using a UDF at all.

     

    To be fair, the test data is *very* nice for this specific method. The lookup table used holds two hundred thousand rows, but the sample data is skewed to use only ten distinct values. So this last CLR version reads those ten values into its cache and then keeps serving them from cache over and over again. While using only a minimal amount of memory for its cache, it reduces the amount of actual I/O to only ten queries. In short, my test case was the ideal use case for the caching technique.

     

    I wanted to know how the cached version holds up when the circumstances are less ideal, so I also executed this query:

     

    SELECT MAX(dbo.CLRTripFromTbl3(KeyVal % 90000)) AS MaxTriple

    FROM   dbo.LargeTable

    WHERE  KeyVal <= 1000000;

     

    By using “KeyVal % 90000” instead of “DataVal”, I now ensure that 90,000 different rows from the lookup table are accessed, each 11 or 12 times. This means that, even with the cache, a lot of database queries are still needed – though still than without the cache. The results were devastating! The above query ran for almost seven MINUTES (415 seconds, to be exact, using 405 seconds CPU time). At this point, I started to think that maybe the cache didn’t work properly with this amount of entries. The WeakReference class I used is supposed to allow allocations to be freed in order to free up some memory; maybe that was happening so aggressively that there cached data was evicted every time before it could be reused? To test this, I ran the same query another time; now the elapsed time was down to “only” 326 seconds, almost 100 seconds less. So it looks like the cache is still working, saving the need to fetch the same data multiple times – but the overhead for creating and using the cache costs more than the saved database calls; the simple CLR UDF that just calls the database every time is lots faster!

    (To be fair, all methods see some performance loss when tested with the equivalent of the above query, just not as much. The T-SQL UDF takes 27 seconds, the CLR UDF takes 364 seconds, and the version with inlined logic instead of a UDF now takes almost 0.5 seconds – slower, but still comfortably running rings around all the competitors!)

     

    Now obviously, the two tests I used are both extreme cases. In many cases where you might consider implementing a CLR UDF with data access, the amount of distinct rows read will be somewhere between ten and ninety thousand – so you really should run your own tests, on your own system and with your own typical data distribution.

     

    There are also a few other caveats you should consider before deciding to implement this caching technique for your scalar CLR user-defined functions:

    ·         The cache persists across query executions. So if I execute the same query again, I will get an even better performance, because all the data is still available in the CLR cache. If you run a profiler trace while executing this query a second time, you will see no data being read at all. For performance, this can be a good thing – but beware! It also means that you will have to find a way to deal with stale data. For triples, this is not really an issue. But there are also many lookup tables where the data may change. Like, for instance, a table with exchange ratios for foreign currencies. How would your bank do if, even after entering the new exchange ratios for the failing European currencies, lookup queries still return the old values?

    ·         The assembly that implements the cache has to be loaded with PERMISSION_SET = UNSAFE. Your DBA might not like that. And for good reasons. If you consider implementing techniques such as this cache, or Adam’s parallelizer, be aware that these techniques trade safety for speed. You may crash your AppDomain, which may cause the loss of all data in the cache or even kill executing queries. You may have to implement code to detect and repair this – if performance is really important, and you really have to use a UDF with data access, you could consider accepting the risks and downsides. But I won’t. Even after discussing this with Adam, I still don’t feel that I really understand the risks – and if I don’t understand the risks, then there is no way I will allow the code to run on my machine (except in test environments).

     

    Bottom line

     

    Out of the box, CLR user-defined functions will not perform well when data access is involved. They can be up to ten times slower than their T-SQL counterparts, and those were already pretty bad.

     

    Previously, we have seen that CLR user-defined functions can offer a good performance improvement when heavy computations are involved. So, what to do if your required functionality includes a combination of data retrieval and heavy computations? My recommendation would be to try to separate the two parts. Retrieve all the data in the T-SQL query, without using any user-defined data, then pass the required arguments to a CLR user-defined function without data access to do the computation. Or do the computation inline – even with heavy computations, inlining the logic sometimes is still faster!

     

    One way to speed up a CLR user-defined function with data access is to implement a cache, to prevent unnecessary roundtrips to the server. This does require you to think very carefully about the risk of serving stale data, and you may also find it very hard to convince your DBA to allow you to load an unsafe assembly – but when implemented successfully, it can result in a tremendous boost in performance, especially if the typical use of the function involves repeatedly reading a small set of data. But beware – when the number of distinct rows that are used (and will hence be read in cache) increases, and the number of cache hits decreases, caching can easily become a bane rather than a boon, because of the overhead involved.

     

    This concludes my discussion of scalar user-defined functions. In the next episodes of this series, we will look at table-valued user-defined functions. And we will find that those come in different flavors, with VERY different performance characteristics. Stay tuned!

  • Upcoming speaking engagements – want to meet me?

    I have a very busy time ahead of me, with lots of travel, lots of speaking engagements, and hence lots of opportunity to meet and catch up with what has become known as the SQL Family. (An excellent term, by the way – it describes exactly how it has always felt to me!)

    So, for everyone who want to know when and where they can meet me (as well as for everyone who wants to make sure to stay as far away from me as possible), here is my schedule for the rest of the year, in chronological order:

    · September 8, SQL Saturday #162, Cambridge, England. I will be presenting on Columnstore Indexes. I have given that presentation before, a.o. at SQL Bits, in a 75-minute time slot – and I had enough material to fill two of those slots. In Cambridge, I have only 50 minutes for my presentation, so that will be a challenge!

    · September 15, SQL Saturday #170, Munich, Germany. Here I will deliver a presentation on the MERGE statement. Also a session that I have presented before (in Portugal and in the Netherlands), but I plan to invest some time to improve and perfect the session.

    · October 1-3, SQL Rally Nordic, Copenhagen, Denmark. In Copenhagen, I will present a session on the dangers of user-defined functions. This session is of course inspired by my current blog-series-in-progress. For the title of the session, I had two ideas. The final title is “The evils of user-defined functions”, but the other contender was “User-defined functions, or how to kill performance in one easy step”. I hope that gives you an idea of what to expect!

    · November 3, SQL Saturday #172, Portland, OR, United States of America. The schedule has not yet been announced, so I have no idea if one of my sessions will be chosen. But even if I am not presenting, I will be there, meeting old and new friends, attending other people’s sessions, and making a nuisance out of myself by asking just the right question at just the wrong time. Or, if one of my sessions is picked, I will be presenting myself, and you all can make a nuisance out of yourself by asking the wrong question at just the right time.

    · November 5-9, PASS Summit 2012, Seattle, WA, United States of America. The biggest of the bunch. After a few years where I was either absent or a regular attendee, PASS has decided to try their luck with me as a speaker again this year. They, too, have chosen to let me speak about the MERGE statement. Obviously, I will improve my slides and demo code even further after delivering this session in Munich, so I hope to really rock the stage!

    · November 10, SQL Saturday #166, Olympia, WA, United States of America. Since I am in the neighborhood anyway, I decided to stick around one more day and go to this fine SQL Saturday event. I submitted the same sessions I also submitted for the Portland event, and they have given me the same feedback (i.e., none yet). In other words, I don’t know yet if I’ll be speaking or what I’ll be speaking about, but I will be there, even if I am not elected as a speaker.

    For those of you who desperately want to see me in real life (honestly? why??) but are not able to make it to any of the above events, there is still hope! Because, as long as it already is, the list above is still incomplete. There are two more events that will take place in 2012 and where I might be speaking. Yes, “might be” – in one case, it is almost certain, but I’ll still keep quiet until the schedule is officially announced. In the other case, I first have to work out some issues in the schedule of myself and my family before I can commit myself to going there – and once that is worked out, I will still keep quiet about that one too until the speakers for the event are officially announced.

    So with at least six and possibly eight (or more? my mail is open to anyone who is organizing events!) speaking engagements for the rest of the year, there is every chance for you to run into me. Are you going to one or more of the listed events? In that case, I hope you’ll come to my sessions. But whether you do, or pick a competing session, I hope even more that you will take the time to step up to me, introduce yourself, and chat.

    I am looking forward to meeting you all!

  • T-SQL User-Defined Functions: the good, the bad, and the ugly (part 3)

    I showed why T-SQL scalar user-defined functions are bad for performance in two previous posts. In this post, I will show that CLR scalar user-defined functions are bad as well (though not always quite as bad as T-SQL scalar user-defined functions).

    I will admit that I had not really planned to cover CLR in this series. But shortly after publishing the first part, I received an email from Adam Machanic, which basically said that I should make clear that the information in that post does not apply to CLR functions. So I dutifully added a comment to that post, and included a similar disclaimer in the second part – but I also planned to run some tests, for it is my strong belief that you should never take anything for granted, no matter how knowledgeable the person telling you it is. And if I am running those tests anyway, why not share my findings with you?

    No data access

    The first part of this post focused on scalar user-defined functions (UDFs) that don’t include any data access; only computations, string handling, etc. I used a very unrepresentative example: multiplying an integer value by 3. Here is a CLR equivalent (using C#) of that function:

    [Microsoft.SqlServer.Server.SqlFunction(DataAccess=DataAccessKind.None,

        IsDeterministic=true,SystemDataAccess=SystemDataAccessKind.None,IsPrecise=true)]

    public static SqlInt32 CLRTriple(SqlInt32 Value)

    {

        return Value * 3;

    }

    In order to test the performance of this version of the function and compare it to the alternatives (T-SQL function and inline logic), I ran this batch against the LargeTable table, which is still populated with ten million rows.

    SET STATISTICS TIME ON;

     

    SELECT MAX(dbo.Triple(DataVal)) AS MaxTriple

    FROM   dbo.LargeTable;

     

    SELECT MAX(dbo.CLRTriple(DataVal)) AS MaxTriple

    FROM   dbo.LargeTable;

     

    SELECT MAX(3 * DataVal) AS MaxTriple

    FROM   dbo.LargeTable;

     

    SET STATISTICS TIME OFF;

    The execution plans immediately show a huge advantage of CLR UDFs over their T-SQL counterparts: they don’t inhibit parallelism!

    image

    But other than that, this execution plan is doing pretty much the same as all other execution plans in this series of blog posts: lying until it is black in its face. Because that is what execution plans do when UDFs are involved, and CLR functions are no exception. For the real comparison, I ran the same batch without outputting the execution plan (to eliminate their overhead) and checked the amount of CPU time used and the time elapsed. For the T-SQL function, CPU time was 35,709 ms, and elapsed was 38,621 ms. The CLR version used less CPU time, only 13,072 ms – a huge saving. And thanks to the parallelism, the saving in elapsed time was even more: only 2,071 ms left.

    However, the version that avoids any UDF and instead places the logic inline still wins, with a CPU time of only 5,741 ms, and an elapsed time of 768 ms – almost three times as fast as the CLR version, and over 45 times as fast as the T-SQL version.

    Given the total lack of complexity in the UDF, the huge performance difference between the CLR and T-SQL versions of the function cannot be explained by CLR code being faster than T-SQL code (though that is definitely the case – see below). The only explanation I can come up with is that invoking a T-SQL involves a lot more overhead than invoking a CLR function. However, the CLR function is still not entirely free of overhead, otherwise it should have performed about the same as the version with the logic inline.

    Complex calculations

    Up until now, the conclusion is that, even though scalar CLR functions definitely perform much better than scalar T-SQL functions, the best performance is still achieved by placing the logic inline. But there are exceptions to this rule. As already mentioned above, CLR code executes a lot faster than T-SQL code – so if you have a function that performs extremely complicated logic, it may well be worth your time to do a thorough comparison between inlined logic and a CLR function.

    Because I wanted to use a realistic example, I decided to dig up the code I wrote many years ago, when I needed to compute the distance between points on the earth on SQL Server 2005 (before spatial data types were introduced). The formula used to calculate the distance between two points, given their latitude and longitude, is as follows:

    image

    This formula assumes that the input and result are all measured in radians. The data set I used for testing has locations with latitude and longitude measured in degrees, and I prefer to see the distance reported in kilometers, so we’ll have to add some unit conversions to get the correct results.

    I first implemented this calculation as a T-SQL scalar user-defined function:

    CREATE FUNCTION dbo.Distance

                   (@Lat1 FLOAT,

                    @Lon1 FLOAT,

                    @Lat2 FLOAT,

                    @Lon2 FLOAT)

    RETURNS FLOAT

    AS

    BEGIN;

    DECLARE @Lat1R FLOAT,

            @Lon1R FLOAT,

            @Lat2R FLOAT,

            @Lon2R FLOAT,

            @DistR FLOAT,

            @Dist FLOAT;

     

    -- Convert from degrees to radians

    SET @Lat1R = RADIANS(@Lat1);

    SET @Lon1R = RADIANS(@Lon1);

    SET @Lat2R = RADIANS(@Lat2);

    SET @Lon2R = RADIANS(@Lon2);

     

    -- Calculate the distance (in radians)

    SET @DistR = 2 * ASIN(SQRT(POWER(SIN((@Lat1R - @Lat2R) / 2), 2)

                                  + (COS(@Lat1R) * COS(@Lat2R) * POWER(SIN((@Lon1R - @Lon2R) / 2), 2))));

     

    -- Convert distance from radians to kilometers

    -- Explanation: Distance in radians = distance in nautical miles * (pi / (180 * 60)), so

    --                 distance in nautical miles = distance in radians * 180 * 60 / pi

    --              One nautical mile is 1.852 kilometers, so

    --                 distance in km = (distance in radians * 180 * 60 / pi) * 1.852

    --              And since 180 * 60 * 1.852 = 20001.6, this can be simplified to

    --                 distance in km = distance in radians * 20001.6 / pi

    SET @Dist = @DistR * 20001.6 / PI();

    RETURN @Dist;

    END;

    GO

    To test its performance, I used a table that Microsoft included as part of the product samples with SQL Server 2005 (I believe these samples are still available on CodePlex). This table includes geographic data for almost 22,993 places in the United States of America. For my test, I chose to calculate the distance between each pair of places in the state Texas; since there are 1,276 places in this state, the function will be invoked 1,276 * 1,276 = 1,628,176 times. To make sure that my test is not influenced by the time to send results to the client or render them on my screen, I included the MAX function, so that only a single number is sent to the client.

    SET STATISTICS TIME ON;

     

    SELECT     MAX(dbo.Distance(a.Lat, a.Lon, b.Lat, b.Lon))

    FROM       dbo.Place AS a

    CROSS JOIN dbo.Place AS b

    WHERE      a.State    = 'TX'

    AND        b.State    = 'TX';

     

    SET STATISTICS TIME OFF;

    This query took over half a minute to finish. 31,449 ms CPU time, and 34,392 ms elapsed time. The time per execution of the user-defined function is decent enough (only about 21 ms), but because of the sheer number of executions, the total running time is still pretty long.

    Before moving to the CLR version of the distance calculation, let’s first try what happens if I avoid the user-defined function and instead place the calculation inline. After all, this proved to be a useful technique in the first part of this series. For this query, the downside of inlining the logic is that it results in a pretty much unmaintainable query – but it does indeed result in an impressive performance boost!

    SET STATISTICS TIME ON;

     

    SELECT     MAX(2 * ASIN(SQRT(POWER(SIN((RADIANS(a.Lat) - RADIANS(b.Lat)) / 2), 2)

                              + (COS(RADIANS(a.Lat)) * COS(RADIANS(b.Lat))

                                * POWER(SIN((RADIANS(a.Lon) - RADIANS(b.Lon)) / 2), 2)

                                ))) * 20001.6 / PI())

    FROM       dbo.Place AS a

    CROSS JOIN dbo.Place AS b

    WHERE      a.State    = 'TX'

    AND        b.State    = 'TX';

     

    SET STATISTICS TIME OFF;

    The CPU time is way down, from over 31 seconds to less than 8: 7,956 ms. Elapsed time is down even more, because the optimizer can parallelize this plan – only 1,557 ms! That is a very nice reduction, but still no reason to stop. As the amount of data increases, even this version can run into performance problems. For instance, when I simply omitted the WHERE clause to make SQL Server find the maximum distance between any of the almost 23,000 places in the database (involving over 500 million distance computations), the query ran for almost five minutes, with an elapsed time of 282,580 ms, and a CPU time of 2,155,995 ms! That’s a good reason to try to optimize this further.

    And that brings us back to the subject of this post: CLR scalar user-defined functions. I have implemented the same distance calculation as a C# function:

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

     

    public partial class UserDefinedFunctions

    {

        [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true)]

        public static SqlDouble DistCLR

                     (SqlDouble Lat1, SqlDouble Lon1,

                      SqlDouble Lat2, SqlDouble Lon2)

        {

            // This is just a wrapper for the T-SQL interface.

            // Call the function that does the real work.

            double Dist = SpatialDist(Lat1.Value, Lon1.Value,

                                      Lat2.Value, Lon2.Value);

            return new SqlDouble(Dist);

        }

     

     

        public static double SpatialDist

                     (double Lat1, double Lon1,

                      double Lat2, double Lon2)

        {

            // Convert degrees to radians

            double Lat1R = Lat1 * Math.PI / 180;

            double Lon1R = Lon1 * Math.PI / 180;

            double Lat2R = Lat2 * Math.PI / 180;

            double Lon2R = Lon2 * Math.PI / 180;

     

            // Calculate distance in radians

            double DistR =

                2 * Math.Asin(Math.Sqrt(Math.Pow(Math.Sin((Lat1R - Lat2R) / 2), 2)

                 + (Math.Cos(Lat1R) * Math.Cos(Lat2R)

                  * Math.Pow(Math.Sin((Lon1R - Lon2R) / 2), 2))));

     

            // Convert from radians to kilometers

            double Dist = DistR * 20001.6 / Math.PI;

     

            return Dist;

        }

    }

    The SQL for testing this is almost identical to the SQL for testing the T-SQL scalar user-defined function – only the function name is changed.

    SET STATISTICS TIME ON;

     

    SELECT     MAX(dbo.DistCLR(a.Lat, a.Lon, b.Lat, b.Lon))

    FROM       dbo.Place AS a

    CROSS JOIN dbo.Place AS b

    WHERE      a.State    = 'TX'

    AND        b.State    = 'TX';

     

    SET STATISTICS TIME OFF;

    The results show that this version is even faster than the T-SQL query with the computation inline, although only marginally. The CPU time is 7,798 ms, and the elapsed time is 1,459 ms. This means that the distance of the maximum calculation across all the places in the United States should also become slightly faster, and indeed it does – the elapsed time is 257,081 ms, and the CPU time is 2,010,774 ms. Still very slow, so if I had to tackle this problem for a real client I would start looking for other optimizations – but in the context of this blog post, I only wanted to show that using CLR scalar user-defined functions for complex computations can sometimes perform better than trying to do those computations in T-SQL only.

    Bottom line

    In the first two parts of this series, I have shown that T-SQL scalar user-defined functions, both with and without data access, are terrible for performance. In this part, I have looked at CLR scalar user-defined functions without data access. As a rule of thumb, those are bad for performance too – but unlike their T-SQL counterparts, when complex calculations are involved, the performance hit of having to invoke the function for each individual row processed by a query can sometimes be offset by the performance gain by moving the complex calculation to the faster CLR environment. If you consider embedding a complex calculation in a CLR scalar user-defined function, I recommend extended performance testing of both the UDF and the equivalent query with the same logic in inlined T-SQL.

    Of course, performance is not always the only deciding factor. Maybe you already have the CLR function because it’s also called from your .Net code; in that case, you can save a lot of development and maintenance effort by reusing that component in your T-SQL code. Maybe you need to do something that is only possible from CLR code (like calling a web service – as long as you are aware that this will have a REALLY detrimental effect on performance!), or something that is simply so much easier in CLR code (like using standard methods for regular expression pattern matching, that would be extremely complicated to code in T-SQL code). Or maybe your UDF will only be called five times per day, and never in a query that processes more than a single row. At the end of the day, you will have to make your decision, based on all the factors that are relevant in your specific scenario.

    The next part of this series will look at CLR scalar user-defined functions with data access; after that, we can finally start to cover the various kinds of table-valued user-defined functions available in SQL Server (the good and the ugly).

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement