THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

  • sp_whoisactive and Azure SQL Database

    I've received a number of requests over the past couple of years for an Azure-specific version of sp_whoisactive. Given the combination of a lack of interesting DMVs available in Azure and my general lack of interest in all things Azure related, creating a modified version of the proc has not exactly been a top priority.

    Luckily, now I don't have to. Microsoft made a number of enhancements to the newest version of Azure SQL Database--v12--including adding all of the DMVs. Even the deprecated ones!

    I just tested sp_whoisactive in a new instance, and it appears to work perfectly!

    The only caveat: The proc has a reference to a couple of views in MSDB, which are not available in Azure. (Makes sense, since you don't have access to Agent and other msdb-type features in Azure.) To get it to compile I had to comment out those lines: All the code between lines 4601 and 4635 of the sp_whoisactive v11.11 script.

    I've attached a version of the script to this blog post that has the lines commented out. I'm not sure if there are other issues hiding somewhere, but so far (very thorough testing for all of 5 minutes in an almost completely empty database) everything seems to be fine.

    In other words, there are probably some issues--if you test and see anything awry, please post a comment here and I'll work on it.

    Enjoy!

  • Re-Inventing the Recursive CTE

    Working with hierarchies in SQL Server never fails to be absolutely fascinating.

    On the face of things they’re super-easy to model and seem to fit perfectly well: parent_key, child_key, self-referencing FK, and you’re done. But then you go to query the thing and there is a complete breakdown.

    Prior to SQL Server 2005 the product included zero language support for hierarchical navigation. That release gave us the recursive CTE, a feature which perhaps could have solved all of our issues. But alas, its implementation is notoriously heavy and sluggish.

    As a result of the language gap, various smart and creative people have come up with lots of interesting and alternative ways of modeling hierarchies. These include materialized (enumerated) path models, nested sets, nested intervals, and of course SQL Server’s own HierarchyId data type. Each of these techniques involves deep understanding, often some interesting math, and lots of work to get right. Just the kind of thing that usually keeps me fully engaged.

    But this post isn’t about any of these alternative models. This post is about standard, run-of-the-mill adjacency lists. I’m a big fan of the adjacency list, if only for the fact that I can put one together without breaking out the espresso machine. If only the performance problems could be fixed, perhaps we could leverage them in a wider variety of scenarios.

    Important notes: 

    • First of all, you might want to review my PASS Summit 2014 presentation on parallelism techniques prior to continuing with this post. The content below builds heavily on many of the ideas I introduced in that session.
    • Second, you might (rightfully) decide that my reasoning is flawed, and that’s fine: I’m advocating for adjacency lists because of simplicity, and some of the queries below are somewhat less than simple. We all need to pick our own favorite poisons.
    • Finally, please be aware that these techniques have been heavily tested, but not rigorously tested. This is more of a research topic for me than something I would recommend as a “best practice,” and there very well may be bugs and/or caveats that I have yet to uncover.

     

    Why are we here?

    Several months ago on a mailing list I subscribe to, there was a long thread on hierarchies and hierarchical modeling. The thread was initially about HierarchyId but eventually morphed into a discussion on various other hierarchy techniques. Much of the thread was devoted to how annoying HierarchyId is to work with, and how annoying it is that adjacency lists can’t perform well.

    This thread sparked a set of questions in my mind: What’s wrong with a simple adjacency list? Why are recursive CTEs not better at doing their job? And is there any way to improve the situation?

    After giving the issue some thought I decided to try to apply some of the techniques I’ve been developing for optimizing parallel query plans. The result? Well, read on…

     

    Hierarchical Modeling, the Adjacency List, and Recursive CTEs

    “Every employee has a manager. So every employee’s row references his manager’s row. Except the CEO; that row has a NULL manager since the CEO doesn’t report to any of our employees.”

    Say that sentence to any technology person, any business person, any pointy-haired boss type, and right away they’ll get exactly what you mean. They’ll understand the model. And if your company hasn’t adopted some weird matrix management approach, it will fit. It’s a simple and absolutely perfect way to model simple reality. It’s the adjacency list.

    And not only is modeling the data easy, data management is just as simple. Need to insert a new middle manager somewhere along the line? Just insert a row and update a few other rows. Need to remove someone? Reverse that process.

    So why, then, do we play games and attempt to use other models? Fire up your favorite hierarchy and ask for a report, with full materialized paths, of everyone who reports up to the CEO. (In the code attached to this post you’ll find two test hierarchies, one wide and one deep. The wide hierarchy has 1,111,111 nodes in only 7 levels. The deep hierarchy has slightly fewer nodes—1,048,575—but it’s 20 levels deep.)

    More than likely—if you’re current with your T-SQL skills—you’ll write a query like this one:

    WITH
    paths AS
    (
        SELECT
            EmployeeID,
            CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
        FROM EmployeeHierarchyWide
        WHERE
            ManagerID IS NULL

        UNION ALL

        SELECT
            ehw.EmployeeID,
            CONVERT(VARCHAR(900), CONCAT(p.FullPath, ehw.EmployeeID, '.'))
        FROM paths AS p
        INNER JOIN EmployeeHierarchyWide AS ehw ON
            ehw.ManagerID = p.EmployeeID
    )
    SELECT
        *
    FROM paths

    This query materializes the path, separating node (employee) IDs using periods, by leveraging a recursive CTE. It returns the desired results, but at a cost: This version, which operates on the wide test hierarchy, takes just under 10 seconds on this end, run in Management Studio with the Discard Results After Execution option set.

    Depending on your typical database style—transaction processing vs. analytical—10 seconds is either a lifetime or doesn’t sound too bad. (I once interviewed a career OLTP developer who told me that no query, in any database, ever, should run for longer than 40ms. I told the company to not hire her for the data warehouse developer position she was interviewing for. I believe her head would have quite literally exploded, right in the middle of her second heart attack, about an hour before lunch on her first day.)

    Once you reset your outlook on query times to something a bit more realistic, you might notice that this isn’t a gigantic amount of data. A million rows is nothing these days, and although the rows are artificially widened—the table includes a string column called “employeedata” that contains between 75 and 299 bytes per row—only 8 bytes per row are brought into the query processor on behalf of this query. 10 seconds, while quite brief for a big analytical query, should be sufficient time to answer much more complex questions than that which I’ve posed here. So based purely on the metric of Adam’s Instinct and Gut Feel, I hereby proclaim that this query feels significantly too slow.

    Just like other queries, understanding the performance characteristics of this one requires a peek at the execution plan:

    image

    In this plan, the anchor part of the CTE is evaluated on the upper subtree under the Concatenation operator, and the recursive part on the lower subtree. The “magic” that makes recursive CTEs work is contained within the Index Spool seen at the upper left part of the image. This spool is, in fact, a special version that allows rows to be dropped in and re-read in a different part of the plan (the Table Spool operator which feeds the Nested Loop in the recursive subtree). This fact is revealed with a glance at the Properties pane:

    image

    The spool in question operates as a stack—a last in, first out data structure—which explains the somewhat peculiar output ordering we see when navigating a hierarchy using a recursive CTE (and not leveraging an ORDER BY clause):

    image

    The anchor part returns EmployeeID 1, and the row for that employee is pushed (i.e. written) into the spool. Next, on the recursive side, the row is popped (i.e. read) from the spool, and that employee’s subordinates—EmployeeIDs 2 through 11—are read from the EmployeeHierarchyWide table. Due to the index on the table, these are read in order. And because of the stack behavior, the next EmployeeID that’s processed on the recursive side is 11, the last one that was pushed.

    While these internals details are somewhat interesting, there are a few key facts that explain both performance (or lack thereof) and some implementation hints:

    • Like most spools in SQL Server, this one is a hidden table in tempdb. This one is not getting spilled to disk when I run it on my laptop, but it’s still a heavy data structure. Every row in the query is effectively read from one table and then re-written into another table. That can’t possibly be a good thing from a speed perspective.
    • Recursive CTEs cannot be processed in parallel. (A plan containing a recursive CTE and other elements may be able to use parallelism for the other elements—but never for the CTE itself.) Even applying trace flag 8649 or using my make_parallel() function will fail to yield any kind of parallelism for this query. This greatly limits the ability for this plan to scale.
    • The parallelism limitation is especially interesting because it is not necessary. Why, for example, can’t one thread process the subordinates of EmployeeID 11, while a second thread processes the subordinates of EmployeeID 10?
    • The stack behavior, likewise, doesn’t really matter. Do you care whether EmployeeID 11 is processed prior to EmployeeID 10? Would this change the output in any meaningful way? Of course not. The stack was no doubt chosen because it’s a simple data structure for solving the problem at hand, but from an implementation standpoint a queue would have been equally effective.

    I like the idea of adjacency lists, and I like the idea of recursive CTEs. They’re easy to understand and easy to maintain. To go faster we must both eliminate tempdb from the equation and remove the limitations on parallel query plans. And using the built-in functionality, that’s just not going to happen. Solution? Roll our own.

     

    hierarchy_navigator: The SQLCLR Recursive CTE

    After consideration of the four facts outlined in the previous section, I realized that it would be fairly simple to create a “recursive CTE” library in C#. By doing so I would be able to eliminate many of the issues with T-SQL recursive CTEs and take full control over processing.

    My core thought was that by creating my own data structures I could eliminate much of the tempdb overhead inherent with T-SQL recursive CTEs. There would naturally be transfer and allocation overhead as part of moving the data into the CLR space, but I felt that I could optimize things to the point where that cost would still be far less than what the query processor has to do in order to maintain a spool. The tradeoff is of course memory, and that’s something I feel is worth sacrificing for better performance. As always, you should make your own decisions on those kinds of issues based on your applications and workloads.

    The data structure I decided to work with was a lock-free queue. Why a queue instead of a stack? Because I had already written the queue for something else. As mentioned above, this is merely an implementation detail. It should not matter to you in what order the data is processed, nor in what order it is output, in the absence of an ORDER BY clause.

    The main benefit of a lock-free queue? Since it’s naturally thread safe it inherently supports parallelism. And since it’s lock-free my CLR assembly can be cataloged as SAFE, a nice benefit especially now that SAFE CLR assemblies are supported in Azure SQL Database.

    Writing the queue in C# is one thing; getting the query processor to use it is a slightly more arduous task. My initial thought was to create a system that would support queries of a form similar to the following:

    SELECT 
        i.EmployeeID, 
        i.FullPath
    FROM
    (
        SELECT
            EmployeeID,
            CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
        FROM EmployeeHierarchyWide
        WHERE
            ManagerID IS NULL

        UNION ALL

        SELECT
            ehw.EmployeeID,
            CONVERT(VARCHAR(900), CONCAT(hi.FullPath, ehw.EmployeeID, '.'))
        FROM dbo.hierarchy_inner(@@SPID) AS hi
        INNER JOIN EmployeeHierarchyWide AS ehw ON 
            ehw.ManagerID = hi.EmployeeID
    ) AS i
    CROSS APPLY dbo.hierarchy_outer(@@SPID, i.EmployeeID, i.FullPath) AS ho

    The idea here was to make the query feel more or less “recursive CTE-esque.” Each row from the derived table [i] (which is really just a modified recursive CTE) would push values into the hierarchy_outer() TVF. This TVF would then internally enqueue the EmployeeID and path and subsequently output a row. (Any row would be fine—the presence of a row is what would matter, not its content, so the idea was for the function to output an integer column called “x” that always had a value of 0.) On the “recursive” part of the query, the hierarchy_inner() function could dequeue an EmployeeID and path, driving a seek into EmployeeHierarchyWide, which would then feed the hierarchy_outer() function, and so on and so forth.

    Each of the functions in this design takes a SQL Server session_id as an argument, in order to appropriately scope the memory allocation required to handle passing of values around on the lock-free queue.

    Part of this design required setting up some internal synchronization logic in the functions. It is to be expected that at various points the internal queue may not have enough of a backlog to keep all of the worker threads busy, and we wouldn’t want the threads without enough work to do to shut down until the entire process is actually complete. In order to facilitate this, I implemented logic as explained by the following flowchart:

    image

    In this scheme each thread sets a central flag (actually a counter) upon entering the function. If a thread begins outputting rows from the queue (and, therefore, reading from the recursive side and potentially pushing more rows onto the queue), that flag will stay set, indicating that any other threads should not shut down even if they have no work to do. The threads that are waiting will yield until either some work appears on the queue, or all of the other threads have set the reader flag off, thereby indicating that no work is forthcoming.

    Once coded and deployed, at first blush everything seemed great. The query optimizer produced a perfect serial plan:

    image

    This plan had exactly the correct shape, with exactly the correct behaviors. And reducing the tempdb overhead helped tremendously: this plan ran in only 6.5 seconds, 45% faster than the recursive CTE.

    Alas, making this into a parallel query was not nearly as easy as simply applying TF 8649. As soon as the query went parallel myriad problems cropped up. The query optimizer, having no clue what I was up to, or the fact that there was a lock-free data structure in the mix, started trying to “help” in various ways…

    image

    This plan might look perfectly decent to the casual observer. Almost the same shape as before, except for that Distribute Streams iterator, whose job it is to parallelize the rows coming from the hierarchy_inner() function. This would have been perfectly fine if hierarchy_inner() were a normal function that didn’t need to retrieve values from downstream in the plan via an internal queue, but that latter condition creates quite a wrinkle.

    The reason this didn’t work? In this plan the values from hierarchy_inner() must be used to drive a seek on EmployeeHierarchyWide so that more rows can be pushed into the queue and used for latter seeks on EmployeeHierarchyWide. But none of that can happen until the first row makes its way down the pipe. This means that there can be no blocking iterators on the critical path. If anything blocks that critical first output row from being used for the seek, or those latter rows from driving more seeks, the internal queue will empty and the entire process will shut down. And unfortunately, that’s exactly what happened here. Distribute Streams is a “semi-blocking” iterator, meaning that it only outputs rows once it amasses a collection of them. (That collection, for parallelism iterators, is known as an Exchange Packet.)

    Phrased another way, the semi-blocking behavior created a chicken-and-egg problem: The plan’s worker threads had nothing to do because they couldn’t get any data, and no data could be sent down the pipe until the threads had something to do. I considered modifying the hierarchy_inner() function to output specially marked junk data in these kinds of situations, in order to saturate the Exchange Packets with enough bytes to get things moving, but that seemed like a dicey proposition. I was unable to come up with a simple algorithm that would pump out only enough data to kick off the process, and only fire at appropriate times. (Such a solution would have to kick in for this initial state problem, but should not kick in at the end of processing, when there is truly no more work left to be done.)

    The only solution, I decided, was to eliminate all blocking iterators from the main parts of the flow—and that’s where things got just a bit more interesting.

    The Parallel APPLY pattern that I have been speaking about at conferences for the past few years works well partly because it eliminates all exchange iterators under the driver loop, so was is a natural choice here. Combined with the initializer TVF method that I discussed in my PASS 2014 session, I thought this would make for a relatively easy solution:

    SELECT
        p.*
    FROM
    (
        SELECT DISTINCT
            x
        FROM dbo.hierarchy_simple_init(@@SPID)
    ) AS v
    OUTER APPLY
    (
        SELECT
            i.EmployeeID,
            i.FullPath
        FROM
        (
            SELECT
                EmployeeID,
                CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
            FROM EmployeeHierarchyWide
            WHERE
                ManagerID IS NULL

            UNION ALL

            SELECT
                ehw.EmployeeID,
                CONVERT(VARCHAR(900), CONCAT(hi.FullPath, ehw.EmployeeID, '.'))
            FROM dbo.hierarchy_inner(v.x, @@SPID) AS hi
            INNER JOIN EmployeeHierarchyWide AS ehw ON 
                ehw.ManagerID = hi.EmployeeID
        ) AS i
        CROSS APPLY dbo.hierarchy_outer(@@SPID, i.EmployeeID, i.FullPath) AS ho
    ) AS p
    WHERE
        p.EmployeeID IS NOT NULL

    To force the execution order I modified the hierarchy_inner function to take the “x” value from the initializer function (“hierarchy_simple_init”). Just as in the example shown in the PASS session, this version of the function output 256 rows of integers in order to fully saturate a Distribute Streams operator on top of a Nested Loop.

    Once applying TF 8649 I discovered that the initializer worked quite well—perhaps too well. Upon running this query rows started streaming back, and kept going, and going, and going…

    image

    The issue? The anchor part must be evaluated only once, even if there are numerous threads involved in the query. In this case I built no protection against it being evaluated multiple times, so the query processor did exactly as I asked and evaluated it 256 times—once per initializer row. As a result, my query produced 256 times more rows than were needed; definitely not the desired output.

    My solution for this was to create yet another function, this one to operate as part of the anchor:

    SELECT
        p.*
    FROM
    (
        SELECT DISTINCT
            x
        FROM dbo.hierarchy_simple_init(@@SPID)
    ) AS v
    OUTER APPLY
    (
        SELECT
            i.EmployeeID,
            i.FullPath
        FROM
        (
            SELECT
                EmployeeID,
                CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
            FROM dbo.hierarchy_anchor(v.x, @@SPID) AS ha
            CROSS JOIN EmployeeHierarchyWide
            WHERE
                ManagerID IS NULL

            UNION ALL

            SELECT
                ehw.EmployeeId,
                CONVERT(VARCHAR(900), CONCAT(hi.FullPath, ehw.EmployeeID, '.'))
            FROM dbo.hierarchy_inner(v.x, @@SPID) AS hi
            INNER JOIN EmployeeHierarchyWide AS ehw ON 
                ehw.ManagerID = hi.EmployeeID
        ) AS i
        CROSS APPLY dbo.hierarchy_outer(@@SPID, i.EmployeeID, i.FullPath) AS ho
    ) AS p
    WHERE
        p.EmployeeID IS NOT NULL

    The function hierarchy_anchor() shown in this version of the query was designed to use the exact same signature as the hierarchy_inner() function, but without the need to touch the queue or anything else internal except a counter to ensure that it would return one, and only one row, per session.

    It almost worked!

    image

    The optimizer decided to push the hierarchy_anchor() function call under the anchor EmployeeHierarchyWide seek, which means that that seek would be evaluated 255 more times than necessary. That could have been considered a flaw, but at this point I was okay with it because each of those 255 seeks were comparatively inexpensive. The anchor part still returned only one set of actual output rows, by virtue of the function filtering things out. So far so good.

    Unfortunately, changing the characteristics of the anchor part also had an impact on the recursive part. The optimizer introduced a sort after the call to hierarchy_inner(), which was a real problem.

    The idea to sort the rows before doing the seek is a sound and obvious one: By sorting the rows by the same key that will be used to seek into a table, the random nature of a set of seeks can be made more sequential. In addition, subsequent seeks on the same key will be able to take better advantage of caching. Unfortunately, for this query these assumptions are wrong in two ways. First of all, this optimization should be most effective when the outer keys are nonunique, and in this case that is not true; there should only be one row per EmployeeID. Second, Sort is yet another blocking operator, and we’ve already been down that path.

    Once again the issue was that the optimizer doesn’t know what’s actually going on with this query, and there was no great way to communicate. Getting rid of a sort that has been introduced due to this type of optimization requires either a guarantee of distinctness or a one-row estimate, either of which tell the optimizer that it’s best not to bother. The uniqueness guarantee is impossible with a CLR TVF without a blocking operator (sort/stream aggregate or hash aggregate), so that was out. One way to achieve a single-row estimate is to use the (admittedly ridiculous) pattern I showed in my PASS 2014 session:

    SELECT
        p.*
    FROM
    (
        SELECT DISTINCT
            x
        FROM dbo.hierarchy_simple_init(@@SPID)
    ) AS v
    OUTER APPLY
    (
        SELECT
            i.EmployeeID,
            i.FullPath
        FROM
        (
            SELECT
                EmployeeID,
                CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
            FROM dbo.hierarchy_anchor(v.x, @@SPID) AS ha
            CROSS JOIN EmployeeHierarchyWide
            WHERE
                ManagerID IS NULL

            UNION ALL

            SELECT
                ehw.EmployeeID,
                CONVERT(VARCHAR(900), CONCAT(hi.FullPath, ehw.EmployeeID, '.'))
            FROM dbo.hierarchy_inner(v.x, @@SPID) AS hi
            CROSS APPLY (SELECT SUM(id) FROM (VALUES(hi.EmployeeID)) AS p0 (id)) AS p(y)
            CROSS APPLY (SELECT SUM(id) FROM (VALUES(hi.EmployeeID)) AS p0 (id)) AS q(y)
            INNER JOIN EmployeeHierarchyWide AS ehw ON
                ehw.ManagerID = hi.EmployeeID
            WHERE
                hi.EmployeeID = p.y
                AND hi.EmployeeID = q.y
                AND hi.EmployeeID = CHECKSUM(hi.EmployeeID)
        ) AS i
        CROSS APPLY dbo.hierarchy_outer(@@SPID, i.EmployeeID, i.FullPath) AS ho
    ) AS p
    WHERE
        p.EmployeeID IS NOT NULL

    The nonsense (and no-op) CROSS APPLYs combined with the nonsense (and once again no-op) predicates in the WHERE clause rendered the desired estimate and eliminated the sort in question:

    image

    Unfortunately, zooming out a bit revealed that other parts of the plan were also impacted:

    image

    The Concatenation operator between the anchor and recursive parts was converted into a Merge Join, and of course merge requires sorted inputs—so the Sort had not been eliminated at all. It had merely been moved further downstream!

    To add insult to injury, the query optimizer decided to put a Row Count Spool on top of the hierarchy_outer() function. Since the input values were unique the presence of this spool would not pose a logical problem, but I saw it as a useless waste of resources in this particular case, as it would never be rewound. (And the reason for both the Merge Join and the Row Count Spool? The same exact issue as the prior one: lack of a distinctness guarantee and an assumption on the optimizer’s part that batching things would improve performance.)

    After much gnashing of teeth and further refactoring of the query, I managed to bring things into a working form:

    SELECT
        p.*
    FROM
    (
        SELECT DISTINCT
            x
        FROM dbo.hierarchy_simple_init(@@SPID)
    ) AS v
    CROSS APPLY
    (
        SELECT
            i.EmployeeID,
            i.FullPath
        FROM
        (
            SELECT
                anchor.*
            FROM
            (
                SELECT TOP(1)
                    *
                FROM dbo.hierarchy_anchor(v.x, @@SPID) AS ha0
            ) AS ha
            CROSS APPLY
            (
                SELECT
                    EmployeeID,
                    CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
                FROM EmployeeHierarchyWide
                WHERE
                    ManagerID IS NULL
            ) AS anchor

            UNION ALL

            SELECT
                recursive.*
            FROM dbo.hierarchy_inner(v.x, @@SPID) AS hi
            OUTER APPLY
            (
                SELECT
                    ehw.EmployeeID,
                    CONVERT(VARCHAR(900), CONCAT(hi.FullPath, ehw.EmployeeID, '.')) AS FullPath
                FROM EmployeeHierarchyWide AS ehw
                WHERE
                    ehw.ManagerID = hi.EmployeeID
            ) AS recursive
        ) AS i
        CROSS APPLY
        (
            SELECT TOP(1)
                ho0.*
            FROM dbo.hierarchy_outer(@@SPID, i.EmployeeID, i.FullPath) AS ho0
        ) AS ho
        WHERE
            i.EmployeeID IS NOT NULL
    ) AS p

     

    Use of OUTER APPLY between the hierarchy_inner() function and the base table query eliminated the need to play games with the estimates with that function’s output. In experimenting with the hierarchy_outer() function call I discovered that telling the optimizer that it would return only one row eliminated the need to work with the outer estimate in order to remove the Merge Join and Row Count Spool. This was done by using a TOP(1), as is shown in the table expression [ho] in the above query. A similar TOP(1) was used to control the estimate coming off of the hierarchy_anchor() function, which helped the optimizer to eliminate the extra anchor seeks into EmployeeHierarchyWide that earlier versions of the query suffered from.

    Use of OUTER APPLY on the recursive part of the query created an interesting twist: nonmatching rows started cycling back in via the hierarchy_outer() call, creating an endless loop—and an endless query. To solve that issue I migrated the NOT NULL check on EmployeeID inside of derived table [p], and converted the outermost APPLY to a CROSS APPLY—which, luckily, did not change the plan shape. (Had it done so, two NOT NULL checks would have been required in this form of the query.)

    The end result query plan (note that this used TF 8649):

    image

     

    Final Enhancements

    The above query returned the correct results, properly supported parallelism, and ran fairly quickly: around 2.2 seconds at DOP 6 on my laptop. However, the work was not completely done.

    While attempting to further reduce the run time I began experimenting with different options and ended up playing with a scalar version of the hierarchy_outer() function. Instead of returning a row, it returned 0 when it successfully enqueued a row. If a NULL EmployeeID was passed in it would return NULL, thereby signaling an invalid row. This changed the query to the following form:

    SELECT
        p.*
    FROM
    (
        SELECT DISTINCT
            x
        FROM dbo.hierarchy_simple_init(@@SPID)
    ) AS v
    CROSS APPLY
    (
        SELECT
            i.EmployeeID,
            i.FullPath
        FROM
        (
            SELECT
                anchor.*
            FROM
            (
                SELECT TOP(1)
                    *
                FROM dbo.hierarchy_anchor(v.x, @@SPID) AS ha0
            ) AS ha
            CROSS APPLY
            (
                SELECT
                    EmployeeID,
                    CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
                FROM EmployeeHierarchyWide
                WHERE
                    ManagerID IS NULL
            ) AS anchor

            UNION ALL

            SELECT
                recursive.*
            FROM dbo.hierarchy_inner(v.x, @@SPID) AS hi
            OUTER APPLY
            (
                SELECT
                    ehw.EmployeeID,
                    CONVERT(VARCHAR(900), CONCAT(hi.FullPath, ehw.EmployeeID, '.')) AS FullPath
                FROM EmployeeHierarchyWide AS ehw
                WHERE
                    ehw.ManagerID = hi.EmployeeID
            ) AS recursive
        ) AS i
        CROSS APPLY
        (
            VALUES(dbo.hierarchy_outer_scalar(@@SPID, i.EmployeeID, i.FullPath))
        ) AS ho (value)
        WHERE
            ho.value IS NOT NULL
    ) AS p

    I didn’t expect much from this and tried it only on a whim—so I was pleasantly surprised when the query finished in only 1.4 seconds, a rather large improvement as compared to the 2.2 second run times I’d been seeing with the prior version.

    The reason this is faster? The scalar function interface requires only a single underlying call, whereas the table-valued interface requires at least two, plus CLR object conversion overhead, plus CLR iterator overhead. The query plan below shows that the scalar version removed one of the plan’s subtrees, which probably also contributed a small performance gain.

    image

    The next step was to hide (i.e. encapsulate) as much complexity as possible. I realized that the various TOP and DISTINCT clauses, as well as the @@SPID calls, could all be handled in a layer of inline T-SQL table valued functions. I created each of the following:

    CREATE FUNCTION dbo.hierarchy_init_t()
    RETURNS TABLE
    AS
    RETURN
    (
        SELECT DISTINCT
            x
        FROM dbo.hierarchy_simple_init(@@SPID)
    )
    GO

    CREATE FUNCTION dbo.hierarchy_anchor_t
    (
        @init_token INT
    )
    RETURNS TABLE
    AS
    RETURN
    (
        SELECT TOP(1)
            *
        FROM dbo.hierarchy_anchor(@init_token, @@SPID)
    )
    GO

    CREATE FUNCTION dbo.hierarchy_inner_t
    (
        @init_token INT
    )
    RETURNS TABLE
    AS
    RETURN
    (
        SELECT
            *
        FROM dbo.hierarchy_inner(@init_token, @@SPID)
    )
    GO

    CREATE FUNCTION dbo.hierarchy_outer_t
    (
        @EmployeeID INT,
        @FullPath NVARCHAR(900)
    )
    RETURNS TABLE
    AS
    RETURN
    (
        SELECT
            0 AS value
        WHERE
            dbo.hierarchy_outer_scalar(@@SPID, @EmployeeID, @FullPath) IS NOT NULL
    )
    GO

    Creating these functions allowed me to create a query that came very close to my original goal, and with the same fast query plan I’d managed to achieve in the prior iteration:

    SELECT
        p.*
    FROM dbo.hierarchy_init_t() AS v
    CROSS APPLY
    (
        SELECT
            i.EmployeeID,
            i.FullPath
        FROM
        (
            SELECT
                ehw.EmployeeID,
                CONVERT(VARCHAR(900), CONCAT('.', ehw.EmployeeID, '.')) AS FullPath
            FROM dbo.hierarchy_anchor_t(v.x) AS ha
            CROSS JOIN EmployeeHierarchyWide AS ehw
            WHERE
                ehw.ManagerID IS NULL

            UNION ALL

            SELECT
                ehw.EmployeeID,
                CONVERT(VARCHAR(900), CONCAT(hi.FullPath, ehw.EmployeeID, '.')) AS FullPath
            FROM dbo.hierarchy_inner_t(v.x) AS hi
            INNER JOIN EmployeeHierarchyWide AS ehw WITH (FORCESEEK) ON
                ehw.ManagerID = hi.EmployeeID
        ) AS i
        CROSS APPLY dbo.hierarchy_outer_t(i.EmployeeID, i.FullPath) AS ho
    ) AS p

    The only slight annoyance here was the FORCESEEK hint I was forced to put on the recursive side. Without it, the query optimizer decided to use a Hash Join for that part of the plan, which obviously didn’t fit at all. As compared to many of the games played on the way to this query, I found FORCESEEK to be very minor, so I wasn’t too upset by its being there. Plus I was able to convert the odd-looking OUTER APPLY into an INNER JOIN, so I decided that it was a net win.

    The final refactoring step was to rename the functions and their outputs to be more general.

    I arrived at the following four T-SQL functions, which are what you will find in the archive attached to this post:

    • hierarchy_init() – This is the initializer function, and returns a column called “initialization_token.” It is backed by a CLR function called hierarchy_init_inner().
    • hierarchy_anchor(@initialization_token) – The anchor function, this returns a column called “value.” It is backed by hierarchy_anchor_inner().
    • hierarchy_recursive(@initialization_token) – The recursive function, this returns two columns, “id” (an integer) and “payload” (nvarchar(4000)). The idea is that most hierarchical navigation will be done using some form of integer parent/child ID scheme. The payload, as a string column, can be packed with anything you’d like to carry along for the ride. This function is backed by hierarchy_recursive_inner().
    • hierarchy_enqueue(@id, @payload) – This is the outer function. It returns a column called “value” and is backed by the scalar hierarchy_enqueue_inner().

    The final version of the query, which you can replicate using the attached scripts, is as follows. Note that I’ve utilized make_parallel(), rather than TF 8649, in order to guarantee a parallel plan.

    SELECT
        x.*
    FROM dbo.make_parallel()
    CROSS APPLY
    (
        SELECT
            p.*
        FROM dbo.hierarchy_init() AS hi
        CROSS APPLY
        (
            SELECT
                i.EmployeeID,
                i.FullPath
            FROM
            (
                SELECT
                    ehw.EmployeeID,
                    CONVERT(VARCHAR(900), CONCAT('.', ehw.EmployeeID, '.')) AS FullPath
                FROM dbo.hierarchy_anchor(hi.initialization_token) AS ha
                CROSS JOIN EmployeeHierarchyWide AS ehw
                WHERE
                    ehw.ManagerID IS NULL

                UNION ALL

                SELECT
                    ehw.EmployeeID,
                    CONVERT(VARCHAR(900), CONCAT(hr.payload, ehw.EmployeeID, '.')) AS FullPath
                FROM dbo.hierarchy_recursive(hi.initialization_token) AS hr
                INNER JOIN EmployeeHierarchyWide AS ehw WITH (FORCESEEK) ON
                    ehw.ManagerID = hr.id
            ) AS i
            CROSS APPLY dbo.hierarchy_enqueue(i.EmployeeID, i.FullPath) AS he
        ) AS p
    ) AS x

     

    Final Notes and Script Attachments

    As mentioned at the top of this post, this was more of a research project than something intended as a solid and proven solution. The techniques presented here have been heavily tested on my laptop and were briefly tested on one real-world project. In both cases I have been extremely happy with the results and have noticed no issues on recent builds of the functions. That said, multithreaded code is always tricky to absolutely guarantee, and I do not know whether some race condition or bug still exists in my code.

    Outside of my code there is also a big issue, and it’s called the query optimizer. While I’ve done everything in my power to control it, use of these components in the real world should only be done by advanced practitioners who are very comfortable reading and manipulating query plans. Divergence from ideal plans may result in wrong results, exceptions, or endless loops.

    The archive attached to this post includes create scripts for two hierarchy test tables (I wanted to test behavioral differences between deep and wide hierarchies. It turns out there aren’t really any, but since I already did the work you can try yourself.); the CLR functions in a binary format (the C# code will not be published at this time); the T-SQL outer functions; and the final query.

    You are licensed to use the code and binaries in any project internal to your company. You are not allowed to re-distribute the code, binaries, or a reverse-engineered version of either as part of another project without my written consent.

    If you do use this code I would love to know how it goes; please drop me a line via e-mail or in the comments here.

    Enjoy, thanks for reading, and as always let me know if you have any questions!

     

    Acknowledgement

    Huge thanks to Julian Bucknall, who kindly fielded my questions as I was writing my lock-free queue implementation.

     

    Update, 2015-10-04

    I found and fixed a minor bug in the assembly. The attachment has been updated with the change.

  • SQLBits 2015 - Query Tuning Mastery: Clash of the Row Goals - Demos

    Thanks to everyone who attended my talk yesterday! The demos are attached to this post. Let me know if you have any questions.

    Until next time...

  • New Book! T-SQL Querying with Itzik Ben-Gan, Dejan Sarka, Kevin Farlee

    I'm excited to announce that after a year of work, T-SQL Querying is now available in e-book form (dead trees will ship next month)! 

    As you might guess from the author list, this book goes deep into various aspects of T-SQL -- both from querying and query processing perspectives. The book includes Itzik's signature T-SQL coverage, data warehousing information from Dejan, and in-memory OLTP from Kevin. The sections I contributed focus on my favorite topics: parallelism and SQLCLR programming.

    I'm quite proud of what we've accomplished and I hope you'll enjoy the fruits of our labor. 

  • Parallel Performance in London: SQLbits XIV, Superheros Edition!

    I'm happy to announce that I'll be returning again this year to the UK's premier SQL Server conference, SQLbits. This year's show is in London, and has a Superheros theme; it should be a great time.

    On Wednesday March 4th, I'll be delivering "Better Performance Through Parallelism" as a full-day preconference seminar. This seminar teaches you how to properly leverage SQL Server's parallel processing capabilities to greatly improve performance of your biggest queries. It includes a large amount of internals and background information, detailed query plan and tuning guidelines, a discussion on server configuration, as well as information on how to properly monitor parallel queries in your environment.

     

    I delivered this seminar at this year's PASS Summit in Seattle, and here is what some of the attendees had to say:

    "this presentation was awesome. it was packed with information and delivered in a clear, concise and easily understood manner. there just wasn't enough time which was a bummer since i found the entire day truly fascinating and didn't want to stop soaking in"

    "My favorite presentation of the conference. Very organized, detailed and full of useful information. I found the demos to be absolutely revealing."

    "Extremely knowledgeable speaker, with quite a lot of material. Handled the room well. Will definitely recommend any session that he presents."

    "This was in the top 3 best sessions I attended at PASS or at a PASS related event. Everything from the background of parallelism to performing analysis of a parallel query execution plan was great."

     

    The SQLbits marketing committee came up with the great idea of having preconference speakers create videos to help promote our sessions. I had a bit of fun with mine...

    Hope to see many of you in London!
  • PASS Summit 2014: Manhandling Parallelism - Demos

    Just a quick post to say thank you to everyone who attended today's Manhandling Parallelism talk! I hope you had even half as much fun as I did. For those of you who didn't attend, the talk was recorded and broadcast on "PASS TV," and I'm sure it will be available somewhere soon for on-demand viewing.

    The demos are attached.

    Let me know in the comments area if you have any questions!

     

  • PASS Summit 2014 and the Contradiction of Long Running Things

    As database professionals, I find that one simple thing unites us in our mission, above all else: we absolutely hate long running things.

    I'm with you. I'm here for you. I want to help you reject long running things.

    PASS Summit 2014, the premier SQL Server conference of the year, is returning to Seattle this November. And I am returning to PASS Summit, after a one-year hiatus, with 8 hours of fun and exciting performance tuning content to help you expunge the long running things from your life.

     

    Better Performance Through Parallelism (Full-Day, 400 Level Preconference Seminar)

    Big queries? Lots of data? No problem. Your server probably has a tremendous amount of CPU resources (most do, these days). And your queries are most likely not using anywhere close to everything they can to help speed up your requests. Want solutions? This advanced seminar is designed to give you a deep and thorough understanding of SQL Server's parallel processing capabilities, along with exactly why and how you can exploit things to massively improve performance -- by a factor of 10x or more in many cases.

    You'll learn about how Windows thinks about your CPUs, how SQL Server thinks about your CPUs, how SQL Server interfaces with Windows, how the query optimizer makes its parallel plan decisions, how the query processor interfaces with SQLOS, and how to leverage some basic Computer Science theorems to help the query processor do a much better job on your behalf. And you'll learn how to troubleshoot and take action when something goes wrong.

    It's a lot of deep material packed into a single day, and that's why it's 400 level. I want to pack your brain full of information. This is a seminar I've delivered a few times now, and I'm going to be making various enhancements for PASS -- including adding more content on Columnstores for SQL Server 2014, some of the new DMVs, and some interesting twists on a few of the techniques. Will be a fun day!

     

    Query Tuning Mastery: Manhandling Parallelism, 2014 Edition (400 Level Breakout Session)

    Where "Better Performance Through Parallelism" ends, "Manhandling Parallelism" takes over. This session will have only a tiny bit of overlap with the seminar, and pretty much assumes that you know everything I've taught in the full-day session. From that common starting point I'll attack the query optimizer and query processor from various angles, exploiting parallelism in new and interesting ways. You'll learn how to hack the optimizer's costing model, how to fix a less than ideal implementation in the storage engine, and how to take control using a series of specially designed CLR functions.

    This session is all about applied internals. You may or may not want to rush some of these techniques directly into your production queries; the idea is to show you what can be done if you push the envelope, help you learn some of what makes SQL Server tick, and hopefully have some fun with it.

     

    And as for the contradiction mentioned in the title of this post?

    Before we reject long running things, I'm going to ask you to embrace long running things. Because you will be the long running thing.

    SQL Long Run (400 Level Outside Run)

    You're about to sit in a convention center for a week filling your brain with technical information. You're about to attend a conference that's full of networking opportunities.

    Can you simultaneously kick off the networking while helping to counteract your forthcoming period of physical inactivity? Why yes, you can!

    SQL Long Run is a simple concept: strap on some running shoes and go for an 11-14 mile run with whatever SQL Server people are in Seattle and want to run with us on the morning of Sunday, November 2. (The day before PASS Summit precons begin.) Have fun, break a sweat, and chat with some fellow runners and SQL geeks.

     

    See you in Seattle!

     

  • PASS Summit Abstract Feedback

    Last week PASS announced its selections for its 2014 Summit. (I had a couple of abstracts selected, but I'll go into detail on those in a future post.)

    As usual, very little feedback was provided to speakers on why abstracts were or were not selected. But this time a number of people demanded action. And as it turns out, PASS was well-equipped to respond.

    The abstract review process PASS uses involves reviewers submitting comments on each abstract. Why PASS has never sent these out, and has actively refused in the past, is beyond me. But the good news is that clearer heads have prevailed. PASS announced yesterday that anyone who submitted for Summit could e-mail and request his or her comments. A huge step in the right direction!

    Naturally I e-mailed the moment I saw the announcement. The comments I received were interesting, but not always actionable. None the less I very much appreciate the ability to see them.

    Further, as someone who has written about abstract writing in the past, and who happily provides public feedback to others, it seems only fair that I now that I've received a bunch of feedback, I should post it for your review.

    So without further ado, here are my four submissions and their comments.

     

    Better Performance Through Parallelism [Level 400 Full-Day]
    Today's server hardware ships with a truly massive amount of CPU power. And while SQL Server is designed to automatically and seamlessly take advantage of available processing resources, there are a number of options, settings, and patterns that can be exploited to optimize parallel processing of your data. This full-day seminar starts with an overview of CPU scheduler internals from the perspective of both Windows and the SQL Server Operating System (SQLOS). Next, you will learn about intra-query parallelism, the method that SQL Server uses to break up large tasks into many smaller pieces, so that each piece can be processed simultaneously. You will come to understand how to read parallel query plans and how to interpret parallel wait statistics, as well as best practices for the various parallelism settings within SQL Server. Finally, you will be introduced to techniques for exploiting parallelism at the query level: patterns that can help the optimizer do a better job of parallelizing your query. After attending this seminar, you will be in full control of your CPUs, able to compel your server to use every clock tick to your end users' advantage.

    • "This is an extremely important topic for database professionals and the importance of it's role in sustainability can't be over stated - great topic" >> Thank you!
    • "Excellently written abstract, and excellent idea for a session. Seemed a bit narrower than I would like, but I have no doubt it would be an awesome, very deep dive session" >> Deep and fairly focused is in fact the idea here. I think that narrow focus is in fact a key attribute of 400-500 level sessions.
    • "Very well presented and has lots of potential for a full room." >> This is interesting abstract feedback. Did this reviewer attend a previous delivery? Or does the comment mean that the abstract itself is well presented?
    • "This could be a bit much for a for a full day pre-con" >> Now this is excellent feedback. I have delivered this particular seminar several times already, but I am going to be enhancing it for this year's PASS Summit. I will definitely think about this feedback and make sure I don't pack too much material in. Sometimes my seminars have been a bit overstuffed and attendees have found the final hour to be somewhat overwhelming as I rushed to finish in time. That's no fun for anyone and I'm actively working on doing a better job in that area.
    • "great topic - very inetrestting [sic]. well written abstract with good deal of details. clear goals. and good balance of demo %" >> I don't recall what I put for goals (PASS asks for three of them) but I'm glad I did the right thing here.

     

    Data, Faster: SQL Server Performance Techniques with SQLCLR [Level 400 Breakout]
    Sometimes it seems like business users are constantly more demanding: They want more data, more analytics, more insight, and the output needs to be faster, faster, faster. And so we database developers struggle to write ever more complex queries, tune our hardware, carefully choose indexes...and sometimes it’s just not enough. Where do you go when you’ve hit the performance wall? As more and more developers are discovering, a great choice is to make use of SQL Server’s CLR capabilities. User-defined aggregates, table valued functions, and user-defined types are more than just logical constructs; properly applied, they can make complex, logic-driven queries many times faster. This session will show you when and how to use these powerful tools in ways you may not have previously considered. You’ll learn how to speed up various types of analytical queries, do much faster XML processing, and build smart caches that will supercharge your user-defined functions. If you’re ready to take your SQL Server development skills to the next level, this session is definitely your starting point.

    • "really advance session on CLR - I canimagine [sic] this appealing to advanced users alot [sic]" >> Thank you!
    • "Not much to say, a very well written abstract and a great idea for a session." >> Thank you!
    • "I would definitely see this. I believe SQL CLR it is a hidden gem. This session seems a great fit for any developer." >> Thank you!
    • "Well written abstract and I have a good idea of what I would learn.  Level is appropriate.  Appeal to a wide audience is where I see the problem." >> Now things are getting interesting. Audience appeal. I agree! What I would absolutely love here would be a suggestion on how to better position this topic so that people will be more interested. That's not the PASS reviewer's job, naturally. But it would have been very nice for the reviewer to type just a few more words. (Hint, hint. Are you out there, reviewer?)
    • "Abstract good use some rework good topic for under used features" >> I'm a bit confused by this one. Is the abstract good or does it need rework? Again, I'd love to get more from this reviewer.
    • "very interesting topic. unique yet very relevant. clear well defined and relevant goals" >> Thank you!

     

    Query Tuning Mastery: Manhandling Parallelism, 2014 Edition [Level 400 Breakout]
    When it comes to driving ultimate performance for your biggest queries, parallelism is the name of the game. Don't let SQL Server decide when and where to apply its parallel magic; be your own query optimizer and take control! The 2012 edition of this talk--one of the most popular sessions at PASS Summit--discussed problems with parallel costing, intricacies of plan selection and execution, and the extremely effective Parallel CROSS APPLY pattern. But these things were just the beginning. Two more years of research have yielded new insights, new techniques, and new performance gains. In this talk you'll get a quick review of the 2012 content before diving into brand-new costing hacks, tricks for leveraging parallel scans, and a set of SQLCLR functions that will give you more control over intraquery parallelism than you ever thought possible. Get ready: It's time to Manhandle Parallelism all over again.

    • "This session has certainly seem a bit of adapation [sic] and I think that is onenof [sic] it's strength. I think this could be one the best attended sessions at PASS this year - it certainly has my vote." >> This feedback is very interesting, as it seems that the reviewer thinks that this is an adapted version of the previously delivered session. That was certainly NOT my goal -- this will be a brand-new session -- so here I have an action item. I've messed something up in the wording and I need to figure out how to make it clearer next time.
    • "Some of the abstract (like telling us it was popular before) are not really what I like to see in an abstract. I am also a bit concerned with the topic which suggests that we should typically take control over query optimization. With that said, it does seem like a fairly decent session idea." >> Another very interesting piece of feedback. Is marketing in an abstract a bad thing? I think it's the entire point of an abstract. But naturally not everyone agrees. Did I go overboard? I'm not sure. But I'll certainly give it some thought. Should you typically take control over query optimization? That's another interesting question. Perhaps I'll talk about it in the session!
    • "Good abstract. Seems like a very nice session for the summit. Quite important topic for anyone writing queries." >> Thank you!
    • "Well written abstract, goals, level, and additional data." >> Thank you!
    • "Well written with clear and concise goals." >> Thank you!
    • "very interesting and current topic - relevant. great abstract - good level of details and contents seem very interesting. clear and well defined goals" >> Thank you!
    • "Granted, presentations are updated all of the time, but I wonder at what point the updates prevent the use of the 'Previously Presented' option.  For the option to be selected, I would expect less than fifteen percent of the content has changed and any more would preclude its use." >> Aha. Another reviewer who has read my abstract to mean that this in merely an update of the 2012 session. One such feedback might have been an outlier. Two of them? There is definitely an issue with the wording here, and I need to give it some serious thought.



    The Need for Speed: Making Big Queries Faster [Level 400 Full-Day]
    Tuning large and complex queries is a special art, often requiring uncommon techniques that aren’t regularly taught in general-purpose performance courses and books. This full-day session is focused specifically on filling in those blanks, providing you with tools to handle your biggest, most important user demands: analytical, decision support, and reporting queries. You'll learn how to look at even the scariest of query plans and quickly figure out which areas are causing your headaches. You’ll learn how to fix the issues using indexing strategies, optimizer manipulations, and T-SQL rewrites. And you’ll learn to think about performance tuning in new and different ways. Attend this seminar to eliminate bottlenecks and make your biggest queries your fastest queries.

    • "great outcomes - great amount of demo's [sic] good abstract. Makes prerequisites clear. I'd attend." >> Thank you!
    • "There are other several sessions on this very same topic. This abstract doesn't indicate if it will cover 2014 or not and I think it is important at this point." >> Very interesting feedback. I thought that by drilling in and only covering "big" queries that the abstract would differentiate itself from all of the other "performance tuning" sessions that are submitted every year, but clearly that didn't happen. The 2014 feedback is a good point too. I didn't put that in on purpose. By the time abstracts were due, I had not yet touched 2014 in a production environment (and, full disclosure, I still haven't). I do not, as a general rule, present on topics that I haven't extensively worked with, so I couldn't put it into the abstract. I left it vague on purpose, hoping that perhaps over the summer I'd get enough production experience to add some 2014 material to the mix. I'm not sure if that was a good decision or not, but it is what it is.
    • "Thanks for the abstract. It’s good to have someone speaking about the Performance and tuning. The abstract content and the goals are quite interesting and It will be sure a great session in PASS Summit 2014. Thanks." >> Thank you!


    And that's that. Now I'll turn things over to you. Have any additional feedback for me? Post it below!

  • TechEd North America 2014 (Houston) - Demos - DBI-B489 and DBI-B333

    Just a quick note to thank everyone who attended both of my sessions at this year's TechEd show in Houston!

    Attached are demos for both the query tuning and window functions talks.

    Enjoy, and let me know if you have any questions.

     

  • SQL Saturday Chicago (#291) - Clash of the Row Goals - Demos

    Thank you to everyone who joined me today at SQL Saturday Chicago for my "Clash of the Row Goals" session. I'm honored that so many people would attend a talk with such an incredibly arcane title -- and always glad to find a curious, interested, and intelligent audience.

    The demos I showed during the session are attached to this post. If you have any followup questions, feel free to leave a comment below.

    Enjoy!

  • SQL Saturday Chicago...and the Gentle Art of Planscaping!

    Three weeks from now I'll be visiting Chicago to participate in SQL Saturday. I'll offer two different learning opportunities:

    On Friday, April 25, you can join me for a full day seminar on parallelism and performance in SQL Server. This class covers in-depth background on the hows and whys behind SQL Server's parallel processing, followed by an investigation of advanced techniques that you can use to improve performance of some queries by an order of magnitude or more. The class is filling up, so make sure to register soon!

    On Saturday, April 26, I'll deliver my newest talk, "Clash of the Row Goals." I unveiled it the first time this past weekend at SQL Saturday Boston, and Joe Chang happened to blog about it a few hours ago. In this talk you'll learn how to shape and modify query plans by leveraging an understanding of query optimizer internals -- without using query hints. I've decided to call this technique planscaping. The idea is to create beautiful query plans that perform well and don't break at unexpected times. That's a good thing, right?

    Hope to see you in Chicago!


  • Demos: SQL Saturday Boston (#262) - Clash of the Row Goals!

    Thanks to everyone who joined me yesterday afternoon in Cambridge for my Clash of the Row Goals talk. It was great to receive such a welcome reception for a rather out there topic.

    Three things I should clarify for those who attended:

    • The Many() function appeared to create an invalid estimate in one case; but that was due to auto-parameterization (which one of the attendees in the front row correctly called out at the time; but I'm not sure everyone heard her)
    • Regarding the demo that failed at the end: As it turns out -- it didn't! I was a bit under-caffeinated and misunderstood my own notes. I've updated the scripts to make things clearer.
    • I blamed the demo failure, jokingly, on Azure. Turns out at least one attendee took me seriously; it showed up on a feedback card. Fact is, I was running the entire show from my laptop. My head might have been in the clouds, but my demos were strictly on terra firma.

    The demo scripts are attached. Enjoy, and let me know if you have any questions or comments. I'd especially love to hear how it goes if you decide to try any of these techniques on one of your problem queries.

  • Query Performance and Parallelism Seminars in Boston and Chicago

    You're a DBA or database developer working on a big (or even big-ish -- doesn't have to be huge) database. Your end-users are doing all sorts of data analysis, and even though you have a pretty beefy server, your queries just don't seem to run as fast as they should.

    The reason (I bet)? Your queries aren't taking full advantage of the CPU power available in today's servers. SQL Server has powerful parallel processing capabilities, but left to its own devices (and defaults) you're never going to see the full potential. Luckily, this can be fixed. You can make your queries much faster. (Often an order of magnitude or more!) And I can show you how.

    Join me in Boston or in Chicago, the Friday before the upcoming SQL Saturday events in those cities, and we'll go deep into parallel processing. You'll learn about Windows server and how it deals with processes and threads, about SQLOS and its thread scheduling internals, how the SQL Server query processor works, and how you can use all of this knowledge to make your queries run faster and your server behave more consistently, even under load.

    These full-day seminars cost only $150 and include lunch, coffee, and a printed course book -- plus a full day of intense content.

    Full information, including an outline and registration details, is available at the links below:

    Hope to see you there!

  • SQLRally Nordic and SQLRally Amsterdam: Wrap Up and Demos

    First and foremost: Huge thanks, and huge apologies, to everyone who attended my sessions at these events. I promised to post materials last week, and there is no good excuse for tardiness. My dog did not eat my computer. I don't have a dog. And if I did, she would far prefer a nice rib eye to a hard chunk of plastic.

    Now, on to the purpose of this post...

    Last week I was lucky enough to have a first visit to each of two amazing cities, Stockholm and Amsterdam. Both cities, as mentioned previously on my blog, hosted SQLRally events, and I did a precon plus two talks at each event.

    The events in both cities were well done and the audiences seemed quite sharp. Very nice events at which to be a speaker! I hope to return to both areas very, very soon. (Hint, hint, event organizers!)

    Precon attendees: You should already know where to get your slides and demos.

    Breakout session attendees: Find the demos for both of my talks attached here.

    Thanks again to everyone who I met and/or taught. It was really a fantastic week. Skål and/or proost -- depending on your disposition!

  • Query Performance Sessions in Stockholm and Amsterdam

    As previously mentioned, I'll be delivering my full-day "No More Guessing" performance troubleshooting seminar at both the Stockholm and Amsterdam SQLRally events. In addition to the seminar, I'll be doing two breakout sessions at each event.

    It's going to be a busy week! But luckily for me, I'm doing the same things in each city:

     

    Using Your Brain to Beat SQL Server

    This session is a joint effort with one of the world's greatest SQL superheroes, Thomas Kejser. The session is all about query optimizer deficiencies and how to win the battle for excellent performance by applying your own intelligence instead of blindly trusting that SQL Server will do the right thing. I think this is going to be a really fun and useful session.

     

    Data, Faster: SQL Server Performance Techniques with SQLCLR

    Remember when SQLCLR was first put into SQL Server and everyone was wondering what they should use it for? I found my answer: raw performance. In this session I'll teach you a SQLCLR pattern that can help reduce the run time of your heaviest and most logic-filled queries, often by an order of magnitude or more.

     

    Looking forward to seeing you in Europe next month!

More Posts Next page »

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement