THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Rob Farley

- Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft MVP (SQL Server) and leader of the SQL User Group in Adelaide, Australia.

  • APPLY – not exactly set-based

    In my last post, I showed a technique for dealing with working columns when writing T-SQL. The idea was around using APPLY to be able to push values from the existing set through calculations (but preferably not scalar functions, of course), producing new columns which can be used further down the query, even in the WHERE and GROUP BY clauses. Useful stuff indeed.

    But there is a limitation which I didn’t cover, and I feel that this is worth writing about for this month’s T-SQL Tuesday. The theme allows people to write about previous Tuesday topics – so I’m going to revisit topics 17 and 25 as I go a little further into APPLY.

    These working columns can only be applied if they are calculations on a single row of data. The resulting set might be bigger or smaller based on how many rows are affected, but the input (at least logically) should be considered on a row-by-row basis.

    Consider the following query:

    SELECT p.Name, r.RevName
    FROM
    Production.ProductSubcategory AS s
    CROSS APPLY
    (SELECT REVERSE(s.Name) AS RevName) AS r
    CROSS APPLY
    (
        SELECT TOP (1) *
        FROM Production.Product AS p
        WHERE p.ProductSubcategoryID = s.ProductSubcategoryID
        ORDER BY p.ListPrice DESC
    ) AS p
    ORDER BY s.ProductSubcategoryID;

    You see two CROSS APPLYs here – one creating a simple working column in REVERSE(s.Name), the other even involving another table to do a lookup. This second CROSS APPLY doesn’t necessarily produce any rows – if there is no matching Product, the resultset won’t contain any rows for that ProductSubcategory. I’ve shown some of its results below:

    But what about the following query, which gives the same 37 rows on the AdventureWorks database:

    SELECT
    (
        SELECT TOP (1) p.Name
        FROM Production.Product AS p
        WHERE p.ProductSubcategoryID = s.ProductSubcategoryID
        ORDER BY p.ListPrice DESC
    ),
    REVERSE(s.Name) AS RevName
    FROM Production.ProductSubcategory AS s
    ORDER BY s.ProductSubcategoryID;

    This is very similar, but you’ll notice that instead of developing the result set in the FROM clause, I’ve used the calculations directly in the SELECT clause to produce the same result.

    Except that it’s not the same.

    Here, my TOP sub-query can only produce a single value. You’ll notice I change my query to fetch only a single column now – any more would give an error. And I’m lucky I’ve selected TOP(1), not TOP(2), or TOP (1) WITH TIES – which would both be legal in my APPLY system. You’ll see that APPLY provides additional flexibility here.

    So now consider the following query:

    SELECT
        p.Name,
        p.ProductSubcategoryID,
        COUNT(*) OVER (PARTITION BY p.ProductSubcategoryID) AS SubcatCnt
    FROM Production.Product AS p
    ORDER BY p.ProductSubcategoryID, p.Name;

    You will see that both look at the list of Products, and produce a count of the number of products which are in the same Subcategory. I’m sure you’re all very familiar with the OVER clause and the use of the windowing technique provided by the PARTITION BY clause.

    If we run this first query and look at some of the results, you’ll see the marvellous windowing technique, which I’ve highlighted using ZoomIt. You’ll see that the SubcatCnt column has the value 3 for the Subcategory with 3 items, and 6 for the one with 6, and so on.

    Excellent stuff. But you’ll probably also be aware that you can’t use windowing functions like this in the WHERE clause – they’re applied only in the SELECT clause.

    Having seen my working columns trick though, you might think otherwise.

    Check this out – it’s perfectly legal!

    SELECT
    p.Name,
    p.ProductSubcategoryID,
    sc.SubcatCnt
    FROM Production.Product AS p
    CROSS APPLY
    (
    SELECT COUNT(*) OVER (PARTITION BY p.ProductSubcategoryID) AS SubcatCnt
    ) AS sc
    ORDER BY p.ProductSubcategoryID, p.Name;

    Amazing stuff. I can now use SubcatCnt in the WHERE clause. Job done!

    But I’m having you on. It’s a trick. It’s rubbish.

    You see, APPLY logically works on a single row at a time, as I said earlier. Look at the same block of results for this second query.

    Loads of 1s. Not exactly helpful. It’s accurate though – there is a count of one row in each partition of the set that is logically passed in each time.

    APPLY works out the result for each row individually. It doesn’t do it for the whole set. So if you’re wanting working columns that need to be applied to multiple rows at once, then you need to think of another solution.

    APPLY is good, but not that good. Know its limitations and you’ll be better off.

  • A T-SQL Tip: Working calculations

    T-SQL Tuesday again and this month is on T-SQL Tips (thanks Allen!). In some ways it’s a tough topic, because there are things I don’t really consider tips that other people do, and vice-versa. This one’s legitimate though: Using CROSS APPLY for working columns in calculations.

    Let me give you an example. TSQL2sDay150x150

    Back in 2009, I wrote a blog post on Julian (YYDDD) dates. http://msmvps.com/blogs/robfarley/archive/2009/03/25/converting-to-and-from-julian-format-in-t-sql.aspx.

    Someone asked me recently about using this for time periods, for example, someone’s age when they play a football match. For example, Cesc Fàbregas holds the record as the youngest Arsenal player, aged 16 years and 177 days. Given someone’s birthdate and date of interest, this seems like it should be quite easy to work out, but can get frustratingly tricky.

    The idea is quite straight forward – you just count the number of days from their most recent birthday. So let’s have a think.

    Counting the number of years between the startdate and the enddate is a nice place to start. I could count the number of days, but that might give me rounding error based on leap years. But counting the years simply compares the year component, such as 1987 and 2003, and the player might not have had his birthday this year yet. In that case, we’d want to go with the year before. Anyway – once we’ve figured out how old the player is, we can figure out when their last birthday was and count the days since then. Easy.

    But so easy to make a mistake somewhere.

    And this is where CROSS APPLY can come into its own, by allowing us to use working columns.

    Let’s start with a few players. Some young ones, and Thierry Henry, just because. I’m using the VALUES method, but you could have a Players table just as easily.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)

    image

    First, I want to make it clear which column is my startdate and which is my enddate. It’s too easy to be thinking “startdate” and pick up “Debut” here, because this sounds very much like the same thing. There’s the startdate of their playing career, and the startdate of the calculation. To avoid confusion, I’m going to do some simple column-renaming. This gives me more reusable code, and APPLY even means I never have to worry about whether I already have these column names referring to something else, because they’re going to get their own table alias too.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
       (select p.DoB as startdate, p.Debut as enddate) as working

    image

    Might seem like a bit of a waste to you, but it means so much to me. Really.

    Now I want to count how many years there are between my startdate and enddate, and work out when the startdate is this year. I could do this in one step, sure, but I want to be able to check my working really thoroughly.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
    (select p.DoB as startdate, p.Debut as enddate) as working
    CROSS APPLY
    (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
    (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty

    image

    Now I can easily test to see if I need to subtract a year or not.

    select *
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
    (select p.DoB as startdate, p.Debut as enddate) as working
    CROSS APPLY
    (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
    (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty
    CROSS APPLY
    (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo
    CROSS APPLY
    (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay

    image

    I’m sure you can see where this is going. I now have their latest birthday before the date I’m looking for, and I can easily turn this into a Julian Date format.

    select p.*, jd.JulianDiff
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY
       (select p.DoB as startdate, p.Debut as enddate) as working
    CROSS APPLY
       (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty
    CROSS APPLY
       (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay
    CROSS APPLY
       (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc
    CROSS APPLY
       (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc
    CROSS APPLY
       (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd
    ;

    image

    The great thing about this is that I can be very confident of my working, being able to check each step of the calculation along the way. I can even turn it into a simple TVF, whilst keeping the calculations just as verifiable:

    create function dbo.YYDDD(@startdate date, @enddate date) returns table
    as return
    (
    select jd.JulianDiff
    FROM
    (values (@startdate, @enddate)) working (startdate, enddate)
    CROSS APPLY
       (select datediff(year,working.startdate,working.enddate) as YearsDifferent) yd
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent,working.startdate) as StartDateThisYear) sdty
    CROSS APPLY
       (select case when sdty.StartDateThisYear > working.enddate then -1 else 0 end as YearOffset) yo
    CROSS APPLY
       (select dateadd(year,yd.YearsDifferent + yo.YearOffset,working.startdate) as AdjustedYear) ay
    CROSS APPLY
       (select (yd.YearsDifferent + yo.YearOffset) * 1000 as YearComponent) yc
    CROSS APPLY
       (select datediff(day,ay.AdjustedYear,working.enddate) as DayComponent) dc
    CROSS APPLY
       (select yc.YearComponent + dc.DayComponent as JulianDiff) as jd
    )
    ;

    select p.*, jd.JulianDiff
    from
    (
        values
         ('Cesc Fàbregas', cast('4 May 1987' as date), cast('28 October 2003' as date))
        ,('Theo Walcott', cast('16 March 1989' as date), cast('19 August 2006' as date))
        ,('Jack Wilshere', cast('1 January 1992' as date), cast('13 September 2008' as date))
        ,('Thierry Henry', cast('17 August 1977' as date), cast('7 August 1999' as date))
    ) p (Name, DoB, Debut)
    CROSS APPLY dbo.YYDDD(DoB, Debut) jd;

    I know it’s really simple, but it’s an amazingly useful tip for making sure that you do things right.

    Oh, and as for the plan – SQL just rolls it all up into a single Compute Scalar as if we’d written it out the long way, happily showing us that doing it step-by-step ourselves is no worse at all.

    image

    @rob_farley

  • Stored Procedures with SSRS? Hmm… not so much

    Little Bobby Tables’ mother says you should always sanitise your data input. Except that I think she’s wrong. The SQL Injection aspect is for another post, where I’ll show you why I think SQL Injection is the same kind of attack as many other attacks, such as the old buffer overflow, but here I want to have a bit of a whinge about the way that some people sanitise data input, and even have a whinge about people who insist on using stored procedures for SSRS reports.

    Let me say that again, in case you missed it the first time:

    I want to have a whinge about people who insist on using stored procedures for SSRS reports.

    Let’s look at the data input sanitisation aspect – except that I’m going to call it ‘parameter validation’. I’m talking about code that looks like this:

    create procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as
    begin
        /* First check that @eomdate is a valid date */
        if isdate(@eomdate) != 1
        begin
            select 'Please enter a valid date' as ErrorMessage;
            return;
        end

        /* Then check that time has passed since @eomdate */
        if datediff(day,@eomdate,sysdatetime()) < 5
        begin
            select 'Sorry - EOM is not complete yet' as ErrorMessage;
            return;
        end
       
        /* If those checks have succeeded, return the data */
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales
        from Sales.SalesOrderHeader
        where OrderDate >= dateadd(month,-1,@eomdate)
            and OrderDate < @eomdate
        group by SalesPersonID
        order by SalesPersonID;
    end

    Notice that the code checks that a date has been entered. Seriously??!! This must only be to check for NULL values being passed in, because anything else would have to be a valid datetime to avoid an error.

    The other check is maybe fair enough, but I still don’t like it.

    The two problems I have with this stored procedure are the result sets and the small fact that the stored procedure even exists in the first place. But let’s consider the first one of these problems for starters. I’ll get to the second one in a moment.

    If you read Jes Borland (@grrl_geek)’s recent post about returning multiple result sets in Reporting Services, you’ll be aware that Reporting Services doesn’t support multiple results sets from a single query. And when it says ‘single query’, it includes ‘stored procedure call’. It’ll only handle the first result set that comes back. But that’s okay – we have RETURN statements, so our stored procedure will only ever return a single result set.  Sometimes that result set might contain a single field called ErrorMessage, but it’s still only one result set.

    Except that it’s not okay, because Reporting Services needs to know what fields to expect. Your report needs to hook into your fields, so SSRS needs to have a way to get that information. For stored procs, it uses an option called FMTONLY.

    When Reporting Services tries to figure out what fields are going to be returned by a query (or stored procedure call), it doesn’t want to have to run the whole thing. That could take ages. (Maybe it’s seen some of the stored procedures I’ve had to deal with over the years!)

    So it turns on FMTONLY before it makes the call (and turns it off again afterwards). FMTONLY is designed to be able to figure out the shape of the output, without actually running the contents. It’s very useful, you might think.

    set fmtonly on
    exec dbo.GetMonthSummaryPerSalesPerson '20030401';
    set fmtonly off

    Without the FMTONLY lines, this stored procedure returns a result set that has three columns and fourteen rows. But with FMTONLY turned on, those rows don’t come back.

    But what I do get back hurts Reporting Services.

    image

    It doesn’t run the stored procedure at all. It just looks for anything that could be returned and pushes out a result set in that shape. Despite the fact that I’ve made sure that the logic will only ever return a single result set, the FMTONLY option kills me by returning three of them.

    It would have been much better to push these checks down into the query itself.

    alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as
    begin
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales
        from Sales.SalesOrderHeader
        where
        /* Make sure that @eomdate is valid */
            isdate(@eomdate) = 1
        /* And that it's sufficiently past */
        and datediff(day,@eomdate,sysdatetime()) >= 5
        /* And now use it in the filter as appropriate */
        and OrderDate >= dateadd(month,-1,@eomdate)
        and OrderDate < @eomdate
        group by SalesPersonID
        order by SalesPersonID;
    end

    Now if we run it with FMTONLY turned on, we get the single result set back. But let’s consider the execution plan when we pass in an invalid date.

    First let’s look at one that returns data. I’ve got a semi-useful index in place on OrderDate, which includes the SalesPersonID and TotalDue fields. It does the job, despite a hefty Sort operation.

    image

    …compared to one that uses a future date:

    image

    You might notice that the estimated costs are similar – the Index Seek is still 28%, the Sort is still 71%. But the size of that arrow coming out of the Index Seek is a whole bunch smaller.

    The coolest thing here is what’s going on with that Index Seek. Let’s look at some of the properties of it.

    image

    Glance down it with me… Estimated CPU cost of 0.0005728, 387 estimated rows, estimated subtree cost of 0.0044385, ForceSeek false, Number of Executions 0.

    That’s right – it doesn’t run. So much for reading plans right-to-left...

    The key is the Filter on the left of it. It has a Startup Expression Predicate in it, which means that it doesn’t call anything further down the plan (to the right) if the predicate evaluates to false.

    image

    Using this method, we can make sure that our stored procedure contains a single query, and therefore avoid any problems with multiple result sets. If we wanted, we could always use UNION ALL to make sure that we can return an appropriate error message.

    alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as
    begin
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, /*Placeholder: */ '' as ErrorMessage
        from Sales.SalesOrderHeader
        where
        /* Make sure that @eomdate is valid */
            isdate(@eomdate) = 1
        /* And that it's sufficiently past */
        and datediff(day,@eomdate,sysdatetime()) >= 5
        /* And now use it in the filter as appropriate */
        and OrderDate >= dateadd(month,-1,@eomdate)
        and OrderDate < @eomdate
        group by SalesPersonID

        /* Now include the error messages */
        union all
        select 0, 0, 0, 'Please enter a valid date' as ErrorMessage
        where isdate(@eomdate) != 1
        union all
        select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage
        where datediff(day,@eomdate,sysdatetime()) < 5

        order by SalesPersonID;
    end

    But still I don’t like it, because it’s now a stored procedure with a single query. And I don’t like stored procedures that should be functions.

    That’s right – I think this should be a function, and SSRS should call the function. And I apologise to those of you who are now planning a bonfire for me. Guy Fawkes’ night has already passed this year, so I think you miss out. (And I’m not going to remind you about when the PASS Summit is in 2012.)

    create function dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime)
    returns table as
    return (
        select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, '' as ErrorMessage
        from Sales.SalesOrderHeader
        where
        /* Make sure that @eomdate is valid */
            isdate(@eomdate) = 1
        /* And that it's sufficiently past */
        and datediff(day,@eomdate,sysdatetime()) >= 5
        /* And now use it in the filter as appropriate */
        and OrderDate >= dateadd(month,-1,@eomdate)
        and OrderDate < @eomdate
        group by SalesPersonID
        union all
        select 0, 0, 0, 'Please enter a valid date' as ErrorMessage
        where isdate(@eomdate) != 1
        union all
        select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage
        where datediff(day,@eomdate,sysdatetime()) < 5
    );

    We’ve had to lose the ORDER BY – but that’s fine, as that’s a client thing anyway. We can have our reports leverage this stored query still, but we’re recognising that it’s a query, not a procedure. A procedure is designed to DO stuff, not just return data. We even get entries in sys.columns that confirm what the shape of the result set actually is, which makes sense, because a table-valued function is the right mechanism to return data.

    And we get so much more flexibility with this.

    If you haven’t seen the simplification stuff that I’ve preached on before, jump over to http://bit.ly/SimpleRob and watch the video of when I broke a microphone and nearly fell off the stage in Wales. You’ll see the impact of being able to have a simplifiable query. You can also read the procedural functions post I wrote recently, if you didn’t follow the link from a few paragraphs ago.

    So if we want the list of SalesPeople that made any kind of sales in a given month, we can do something like:

    select SalesPersonID
    from dbo.GetMonthSummaryPerSalesPerson(@eomonth)
    order by SalesPersonID;

    This doesn’t need to look up the TotalDue field, which makes a simpler plan.

    select *
    from dbo.GetMonthSummaryPerSalesPerson(@eomonth)
    where SalesPersonID is not null
    order by SalesPersonID;

    This one can avoid having to do the work on the rows that don’t have a SalesPersonID value, pushing the predicate into the Index Seek rather than filtering the results that come back to the report.

    If we had joins involved, we might see some of those being simplified out. We also get the ability to include query hints in individual reports. We shift from having a single-use stored procedure to having a reusable stored query – and isn’t that one of the main points of modularisation?

    Stored procedures in Reporting Services are just a bit limited for my liking. They’re useful in plenty of ways, but if you insist on using stored procedures all the time rather that queries that use functions – that’s rubbish.

    @rob_farley

  • SQLRally Nordic gets underway

    PASS is becoming more international, which is great.

    The SQL Community has always been international – it’s not as if data is only generated in North America. And while it’s easy for organisations to have a North American focus, PASS is taking steps to become international. Regular readers will be aware that I’m one of three advisors to the PASS Board of Directors, with a focus on developing PASS as a more global organisation.

    With this in mind, it’s great that today is Day 1 of SQLRally Nordic, being hosted in in Sweden – not only a non-American country, but one that doesn’t have English as its major language.

    The event has been hosted by the amazing Johan Åhlén and Raoul Illyés, two guys who I met earlier this year, but the thing that amazes me is the incredible support that this event has from the SQL Community.

    It’s been sold out for a long time, and when you see the list of speakers, it’s not surprising.

    Some of the industry’s biggest names from Microsoft have turned up, including Mark Souza (who is also a PASS Director), Thomas Kejser and Tobias Thernström. Business Intelligence experts such as Jen Stirrup, Chris Webb, Peter Myers, Marco Russo and Alberto Ferrari are there, as are some of the most awarded SQL MVPs such as Itzik Ben-Gan, Aaron Bertrand and Kevin Kline. The sponsor list is also brilliant, with names such as HP, FusionIO, SQL Sentry, Quest and SolidQ complimented by Swedish companies like Cornerstone, Informator, B3IT and Addskills.

    As someone who is interested in PASS becoming global, I’m really excited to see this event happening, and I hope it’s a launch-pad into many other international events hosted by the SQL community.

    If you have the opportunity, thank Johan and Raoul for putting this event on, and the speakers and sponsors for helping support it. The noise from Twitter is that everything is going fantastically well, and everyone involved should be thoroughly congratulated!

    @rob_farley

  • When is a SQL function not a function?

    Should SQL Server even have functions? (Oh yeah – this is a T-SQL Tuesday post, hosted this month by Brad Schulz) TSQL2sDay150x150

    Functions serve an important part of programming, in almost any language. A function is a piece of code that is designed to return something, as opposed to a piece of code which isn’t designed to return anything (which is known as a procedure). SQL Server is no different. You can call stored procedures, even from within other stored procedures, and you can call functions and use these in other queries. Stored procedures might query something, and therefore ‘return data’, but a function in SQL is considered to have the type of the thing returned, and can be used accordingly in queries. Consider the internal GETDATE() function.

    SELECT GETDATE(), SomeDatetimeColumn
    FROM dbo.SomeTable;

    There’s no logical difference between the field that is being returned by the function and the field that’s being returned by the table column. Both are the datetime field – if you didn’t have inside knowledge, you wouldn’t necessarily be able to tell which was which. And so as developers, we find ourselves wanting to create functions that return all kinds of things – functions which look up values based on codes, functions which do string manipulation, and so on.

    But it’s rubbish.

    Ok, it’s not all rubbish, but it mostly is. And this isn’t even considering the SARGability impact. It’s far more significant than that. (When I say the SARGability aspect, I mean “because you’re unlikely to have an index on the result of some function that’s applied to a column, so try to invert the function and query the column in an unchanged manner”)

    I’m going to consider the three main types of user-defined functions in SQL Server:

    • Scalar
    • Inline Table-Valued
    • Multi-statement Table-Valued

    I could also look at user-defined CLR functions, including aggregate functions, but not today. I figure that most people don’t tend to get around to doing CLR functions, and I’m going to focus on the T-SQL-based user-defined functions.

    Most people split these types of function up into two types. So do I. Except that most people pick them based on ‘scalar or table-valued’. I’d rather go with ‘inline or not’.

    If it’s not inline, it’s rubbish. It really is.

    Let’s start by considering the two kinds of table-valued function, and compare them.

    These functions are going to return the sales for a particular salesperson in a particular year, from the AdventureWorks database.

    CREATE FUNCTION dbo.FetchSales_inline(@salespersonid int, @orderyear int)
    RETURNS TABLE AS 
    RETURN (
        SELECT e.LoginID as EmployeeLogin, o.OrderDate, o.SalesOrderID
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = @salespersonid
        AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
        AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
    )
    ;
    GO

    CREATE FUNCTION dbo.FetchSales_multi(@salespersonid int, @orderyear int)
    RETURNS @results TABLE (
        EmployeeLogin nvarchar(512),
        OrderDate datetime,
        SalesOrderID int
        )
    AS
    BEGIN
        INSERT @results (EmployeeLogin, OrderDate, SalesOrderID)
        SELECT e.LoginID, o.OrderDate, o.SalesOrderID
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = @salespersonid
        AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
        AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
        ;
        RETURN
    END
    ;
    GO

    You’ll notice that I’m being nice and responsible with the use of the DATEADD function, so that I have SARGability on the OrderDate filter.

    Regular readers will be hoping I’ll show what’s going on in the execution plans here. Here I’ve run two SELECT * queries with the “Show Actual Execution Plan” option turned on.

    image

    Notice that the ‘Query cost’ of the multi-statement version is just 2% of the ‘Batch cost’. But also notice there’s trickery going on. And it’s nothing to do with that extra index that I have on the OrderDate column.

    Trickery.

    Look at it – clearly, the first plan is showing us what’s going on inside the function, but the second one isn’t. The second one is blindly running the function, and then scanning the results. There’s a Sequence operator which is calling the TVF operator, and then calling a Table Scan to get the results of that function for the SELECT operator. But surely it still has to do all the work that the first one is doing...

    To see what’s actually going on, let’s look at the Estimated plan.

    image

    Now, we see the same plans (almost) that we saw in the Actuals, but we have an extra one – the one that was used for the TVF. Here’s where we see the inner workings of it. You’ll probably recognise the right-hand side of the TVF’s plan as looking very similar to the first plan – but it’s now being called by a stack of other operators, including an INSERT statement to be able to populate the table variable that the multi-statement TVF requires.

    And the cost of the TVF is 57% of the batch!

    But it gets worse.

    Let’s consider what happens if we don’t need all the columns. We’ll leave out the EmployeeLogin column.

    image

    Here, we see that the inline function call has been simplified down. It doesn’t need the Employee table. The join is redundant and has been eliminated from the plan, making it even cheaper. But the multi-statement plan runs the whole thing as before, only removing the extra column when the Table Scan is performed.

    A multi-statement function is a lot more powerful than an inline one. An inline function can only be the result of a single sub-query. It’s essentially the same as a parameterised view, because views demonstrate this same behaviour of extracting the definition of the view and using it in the outer query. A multi-statement function is clearly more powerful because it can contain far more complex logic.

    But a multi-statement function isn’t really a function at all. It’s a stored procedure. It’s wrapped up like a function, but behaves like a stored procedure. It would be completely unreasonable to expect that a stored procedure could be simplified down to recognise that not all the columns might be needed, but yet this is part of the pain associated with this procedural function situation.

    The biggest clue that a multi-statement function is more like a stored procedure than a function is the “BEGIN” and “END” statements that surround the code. If you try to create a multi-statement function without these statements, you’ll get an error – they are very much required. When I used to present on this kind of thing, I even used to call it “The Dangers of BEGIN and END”, and yes, I’ve written about this type of thing before in a similarly-named post over at my old blog.

    Now how about scalar functions...

    Suppose we wanted a scalar function to return the count of these.

    CREATE FUNCTION dbo.FetchSales_scalar(@salespersonid int, @orderyear int)
    RETURNS int
    AS
    BEGIN
        RETURN (
           
    SELECT COUNT(*)
            FROM Sales.SalesOrderHeader AS o
            LEFT JOIN HumanResources.Employee AS e
            ON e.EmployeeID = o.SalesPersonID
            WHERE o.SalesPersonID = @salespersonid
            AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
            AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
        );
    END
    ;
    GO

    Notice the evil words? They’re required. Try to remove them, you just get an error.

    That’s right – any scalar function is procedural, despite the fact that you wrap up a sub-query inside that RETURN statement. It’s as ugly as anything. Hopefully this will change in future versions.

    Let’s have a look at how this is reflected in an execution plan. Here’s a query, its Actual plan, and its Estimated plan:

    SELECT e.LoginID, y.year, dbo.FetchSales_scalar(p.SalesPersonID, y.year) AS NumSales
    FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year)
    CROSS JOIN Sales.SalesPerson AS p
    LEFT JOIN HumanResources.Employee AS e
    ON e.EmployeeID = p.SalesPersonID;

    image

    image

    We see here that the cost of the scalar function is about twice that of the outer query. Nicely, the query optimizer has worked out that it doesn’t need the Employee table, but that’s a bit of a red herring here. There’s actually something way more significant going on.

    If I look at the properties of that UDF operator, it tells me that the Estimated Subtree Cost is 0.337999.

    image

    If I just run the query SELECT dbo.FetchSales_scalar(281,2003); we see that the UDF cost is still unchanged.

    image

    You see, this 0.0337999 is the cost of running the scalar function ONCE.

    But when we ran that query with the CROSS JOIN in it, we returned quite a few rows. 68 in fact. Could’ve been a lot more, if we’d had more salespeople or more years.

    image

    And so we come to the biggest problem. This procedure (I don’t want to call it a function) is getting called 68 times – each one between twice as expensive as the outer query. And because it’s calling it in a separate context, there is even more overhead that I haven’t considered here.

    The cheek of it, to say that the Compute Scalar operator here costs 0%! I know a number of IT projects that could’ve used that kind of costing method, but that’s another story that I’m not going to go into here.

    Let’s look at a better way.

    Suppose our scalar function had been implemented as an inline one. Then it could have been expanded out like a sub-query. It could’ve run something like this:

    SELECT e.LoginID, y.year,
    (SELECT COUNT(*)
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = p.SalesPersonID
        AND o.OrderDate >= DATEADD(year,y.year-2000,'20000101')
        AND o.OrderDate < DATEADD(year,y.year-2000+1,'20000101')
        ) AS NumSales
    FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year)
    CROSS JOIN Sales.SalesPerson AS p
    LEFT JOIN HumanResources.Employee AS e
    ON e.EmployeeID = p.SalesPersonID;

    image

    Don’t worry too much about the Scan of the SalesOrderHeader underneath a Nested Loop. If you remember from plenty of other posts on the matter, execution plans don’t push the data through. That Scan only runs once. The Index Spool sucks the data out of it and populates a structure that is used to feed the Stream Aggregate. The Index Spool operator gets called 68 times, but the Scan only once (the Number of Executions property demonstrates this).

    Here, the Query Optimizer has a full picture of what’s being asked, and can make the appropriate decision about how it accesses the data. It can simplify it down properly.

    To get this kind of behaviour from a function, we need it to be inline. But without inline scalar functions, we need to make our function be table-valued. Luckily, that’s ok.

    CREATE FUNCTION dbo.FetchSales_inline2(@salespersonid int, @orderyear int)
    RETURNS table
    AS
    RETURN (SELECT COUNT(*) as NumSales
        FROM Sales.SalesOrderHeader AS o
        LEFT JOIN HumanResources.Employee AS e
        ON e.EmployeeID = o.SalesPersonID
        WHERE o.SalesPersonID = @salespersonid
        AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101')
        AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101')
    );
    GO

    But we can’t use this as a scalar. Instead, we need to use it with the APPLY operator.

    SELECT e.LoginID, y.year, n.NumSales
    FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year)
    CROSS JOIN Sales.SalesPerson AS p
    LEFT JOIN HumanResources.Employee AS e
    ON e.EmployeeID = p.SalesPersonID
    OUTER APPLY dbo.FetchSales_inline2(p.SalesPersonID, y.year) AS n;

    And now, we get the plan that we want for this query.

    image

    All we’ve done is tell the function that it’s returning a table instead of a single value, and removed the BEGIN and END statements. We’ve had to name the column being returned, but what we’ve gained is an actual inline simplifiable function. And if we wanted it to return multiple columns, it could do that too. I really consider this function to be superior to the scalar function in every way.

    It does need to be handled differently in the outer query, but in many ways it’s a more elegant method there too. The function calls can be put amongst the FROM clause, where they can then be used in the WHERE or GROUP BY clauses without fear of calling the function multiple times (another horrible side effect of functions).

    So please. If you see BEGIN and END in a function, remember it’s not really a function, it’s a procedure. And then fix it.

    @rob_farley

  • Table-valued parameters in SQL 2005

    Can’t be done? Oh, but it can. Let me show you.

    Just quickly, a Table-Valued Parameter is a useful thing introduced in SQL 2008 that lets you have a read-only parameter which is a table type, passed into a stored procedure. To use it you need to have a user-defined table type, so that you can define what is going to be passed in. You can read about them at http://msdn.microsoft.com/en-us/library/bb510489.aspx

    The 2008 code looks something like this:

    /* First create a database to play in */
    create database TVPDemo;
    go
    use TVPDemo;
    go

    /* And create a table with some sample data. I’m getting mine from AdventureWorks */
    select ProductID, Name as ProductName, ListPrice, ProductSubcategoryID
    into dbo.Products
    from AdventureWorks.Production.Product;
    go

    /* Now for the real stuff – create a table type */
    create type dbo.NewProducts as table
    (ProductName nvarchar(100) collate Latin1_General_CI_AS
    ,ListPrice money
    ,SubCategory int
    ,DeleteMe bit
    );
    go

    /* And a stored procedure which uses this table type */
    create procedure dbo.MaintainProducts(@NewProducts dbo.NewProducts readonly) as
    begin
    /* Obviously we could MERGE – that’d work nicely here. But I want 2005 features */

       /* Update some products into Products table */  
       update p set ListPrice = n.ListPrice, ProductSubcategoryID = n.SubCategory
       from dbo.Products p
       join @NewProducts n
       on n.ProductName = p.ProductName
       where n.DeleteMe = cast(0 as bit);

       /* Insert some */  
       insert dbo.Products (ProductName, ListPrice, ProductSubcategoryID)
       select n.ProductName, n.ListPrice, n.SubCategory
       from @NewProducts n
       where not exists (select * from dbo.Products p where p.ProductName = n.ProductName)
       and n.DeleteMe = cast(0 as bit);

       /* And delete some */  
       delete p
       from dbo.Products p
       join @NewProducts n
       on n.ProductName = p.ProductName
       where n.DeleteMe = cast(1 as bit);

       /* Now list them all, returning this to the client */ 
       select *
       from dbo.Products;

    end
    go

    /* Now let’s familiarise ourselves with what’s in Product now */
    select *
    from dbo.Products;

    /* And do some maintenance on it. We create a table variable of the appropriate type, populate it and call the proc */
    declare @SomeNewProducts dbo.NewProducts;
    insert @SomeNewProducts (ProductName, ListPrice, SubCategory, DeleteMe)
    select 'Blade', 0.1, 1, 0
    union all
    select 'Blade2', 0.1, null, 0
    union all
    select 'Bearing Ball', 1, 2, 1
    ;

    exec dbo.MaintainProducts @SomeNewProducts;

    /*
    When we ran this stored procedure, the latest version of dbo.Products was outputted, so we can clearly see the new record, and the absence of the one we deleted.
    Lovely
    */

    But this wasn’t possible in SQL 2005. We didn’t have user-defined table types, and we certainly didn’t have table-valued parameters.

    Except that we could still do something very similar. This was something I’d taken for granted, but when I showed this to someone at the PASS Summit, and then someone else, I got persuaded to write a blog post on it.

    If you haven’t seen this idea before, I’m sure you’ll kick yourself. It’s remarkably simple, but I think it’s quite powerful. Like I said – I’d taken it for granted.

    The idea is this: make a VIEW with an INSTEAD OF trigger, using the inserted table instead of the table variable.

    That INSTEAD OF trigger is essentially where your stored procedure is kept. A trigger is still a procedure, it’s just not stored in the traditional list of stored procedures. But it will act just like one.

    As for the view – that can just be a placeholder. Think of it as simply defining the columns you need to handle. You don’t need a FROM clause, and you don’t even need any rows to come back. I like to put a contradiction in there so that I don’t think there’s any real values coming out.

    So a trigger doesn’t take a table-valued parameter, but it can leverage the inserted and deleted tables that are available in triggers. For us, we’re just interested in the former. Have a look at the code, and you’ll see what I mean.

    This code can run on SQL 2005 (well, it can also run on later versions, but that’s less important).

    /* First let’s set up a new database, just like we did in SQL 2008*/
    create database TVPDemo;
    go
    use TVPDemo;
    go

    select ProductID, Name as ProductName, ListPrice, ProductSubcategoryID
    into dbo.Products
    from AdventureWorks.Production.Product;
    go

    /* Here’s the tricky bit. Make a view. Focus on the columns. I put WHERE 0=1 in, just to make it cleaner */
    create view dbo.NewProducts as
    select
        cast(N'' as nvarchar(100)) collate SQL_Latin1_General_CP1_CI_AS as ProductName,
        cast(0 as money) as ListPrice,
        cast(0 as int) as SubCategory,
        cast(0 as bit) as DeleteMe
    where 0=1
    ;
    go

    /* This trigger contains the same code as in the 2008 stored procedure.
    * But instead of having a table variable, we use the inserted table.
    */
    create trigger dbo.MaintainProducts on dbo.NewProducts instead of insert as
    begin
       /* Update some products into Products table */  
       update p set ListPrice = i.ListPrice, ProductSubcategoryID = i.SubCategory
       from dbo.Products p
       join inserted i
       on i.ProductName = p.ProductName
       where i.DeleteMe = cast(0 as bit);

       /* Insert some */  
       insert dbo.Products (ProductName, ListPrice, ProductSubcategoryID)
       select i.ProductName, i.ListPrice, i.SubCategory
       from inserted i
       where not exists (select * from dbo.Products p where p.ProductName = i.ProductName)
       and i.DeleteMe = cast(0 as bit);

       /* And delete some */  
       delete p
       from dbo.Products p
       join inserted i
       on i.ProductName = p.ProductName
       where i.DeleteMe = cast(1 as bit);

      /* Now list them all, returning this to the client */
       select *
       from dbo.Products;

    end
    go

    /* Look what’s in there now */
    select *
    from dbo.Products;

    /* Remember there’s never anything in here */
    select *
    from dbo.NewProducts;

    /* Now we simply insert into our view. As we do, the trigger runs immediately and makes the changes */
    insert dbo.NewProducts (ProductName, ListPrice, SubCategory, DeleteMe)
    select 'Blade', 0.1, 1, 0
    union all
    select 'Blade2', 0.1, null, 0
    union all
    select 'Bearing Ball', 1, 2, 1

    So there you have it – a useful TVP equivalent in versions prior to SQL 2008. I get that I’m probably writing this post about ten years too late. Sorry about that.

    But if you’re not fond of the idea of having to declare and populate a table variable, then perhaps this idea is for you. This method will support any type of inserting, whether it’s row-by-row, or the results of a single SELECT statement. One day though, TVPs won’t be READONLY any more (this doesn’t seem to be the case for SQL Server 2012 unfortunately), and when that happens, you’ll want to definitely be using TVPs.

    @rob_farley

  • Highlights and Lowlights of PASS Summit 2011

    This was a proper big week.

    The PASS Summit ran from Tuesday night to Friday, but I’d arrived in America the Friday before. So by the time it actually started, I had that strange feeling that things were wrapping up. My calendar was ridiculously full. The stuff that I was aware of ahead of time looked like this:

    Friday 7th: Arrive in America. Travel to Portland. Speaker Dinner.

    Saturday 8th: SQL Saturday #92 (two sessions to give, plus a song performance with Buck during the morning break). Charity dinner for The Leukemia & Lymphoma Society (whom LobsterPot sponsor).

    Sunday 9th: Walk the Portland Half Marathon. Travel to Seattle. Collect my kilt. Register for the Summit.

    Monday 10th: Early morning prayer meeting. Deliver a pre-conference seminar. Insiders Dinner in evening.

    Tuesday 11th: Early morning prayer meeting. Meeting about SQL Saturday. PASS Board Meeting. Insiders Day at the Microsoft Campus. Opening Night Party, including being a Quiz Bowl contestant. Speaker/Volunteer Dinner.

    Wednesday 12th: Early morning prayer meeting. Chapter leaders’ meeting. Chapter lunch. Book signing. Lightning Talk to deliver (my song). Global Growth meeting. Exhibitors’ party. Parties for SQL People, SQL Sentry, SolidQ and SQLKaraoke.

    Thursday 13th: Early morning prayer meeting. WIT Lunch. Spotlight session to deliver. Redgate Dinner. Party at Gameworks.

    Friday 14th: Early morning prayer meeting. More book signing. Board Q&A session. Board photos.

    Saturday 15th: Sleep in and fly home.

    That’s the short version. Really. There were a lot of other things that were squeezed in – in particular, the fact that I had promised Audrey Hammonds about six months ago that I would run through her presentation with her, and one other thing that became somewhat significant: THE FACT THAT ALLEN KINSEL ARRANGED FOR ME TO PERFORM MY SONG DURING THE FRIDAY KEYNOTE.

    Readers of my blog (and plenty of other blogs) will know by now that I played my guitar and sang at the start of the Friday Keynote. Buck Woody accompanying me with his guitar and backing vocals. What you might not know is that this only got arranged late on Wednesday night. Allen had seen me perform it (again with Buck) during the Lightning Talks on Wednesday afternoon, and by the time the night was over, Buck and I were booked in to perform it in front of the three and a half thousand delegates at the biggest keynote of the week. I’ve played and sang before a couple of hundred before, but never even close to that many. To say I had mixed emotions would’ve been an understatement. I didn’t hesitate to say yes, and was excited, but was also phenomenally nervous.

    As a late entry to a tight schedule, we got two minutes only and had to cut the song short. We didn’t sing the bridge section, so stopped after just Verse Chorus Verse Chorus. It was tremendous fun, and I loved seeing faint glows of phones being waved around in the crowd. I desperately wanted to look into the crowd as I was walking off stage, but my nerves and the fear of tripping on the guitar lead froze me completely and I didn’t turn my head at all. I was told later that there was a standing ovation – but I was just full of emotion, and so tired. So tired. It was my own fault – I’d gone out with Microsoft people after the various parties, and when they’d all gone to bed for a couple of hours around 5:30am, I’d got myself ready to lead a quick song at the 6am prayer meeting head off to my 6:15am sound check, prior to the 7:15am book signing. I got through the day (although I fell asleep for a few minutes during Audrey’s session, which I will need to watch to see how she did). I eventually got to bed around 10:30pm Friday night, and about twelve hours later managed to get up with just enough time to pack and check out of the hotel for noon.

    But this wasn’t the highlight for me.

    Nor was the highlight the fact that my level of involvement was so much greater, now that I’m a Board Member. I have to admit that I always get so much more out of events like this if I’m involved. I appreciate that I can get to more sessions if I’m not already laden with other entries on my calendar, but I would rather serve others wherever possible. I’m still new on the board, but as an advisor, I’m hoping to be able to influence things like the Summit more and more next year, and maybe there will be a few areas in which we can find opportunity to improve it still.

    Serving is great – but wasn’t the highlight.

    The highlight was the people. It always is, and it always will be. Right from the moment I arrived in America and tracked down John & Yanni Robel and Jes Borland. Arriving in Portland and seeing Jeremiah & Kendra, Buck Woody, Tim Ford, Erin Stellato and many more at the SQLSaturday events. Participating in the Portland Half Marathon with Erin, Jes, Yanni, Brent Ozar, Karen Lopez, Doug Lane (who did 10km) and Allen White (who did the full marathon), and a lots of people wearing the LobsterPot logo on their shirts. And despite not being able to run (I physically can’t run because of an old back injury, so I just walked it), I didn’t even finish last of the half marathoners! For all the extra pain I felt because my back isn’t really up to walking 21km in 3 hours, it was worth it. I’m especially proud of Yanni, who has achieved amazing things despite having a nasty blood cancer.

    Despite Sunday being a ridiculously early start and a very long day, each day started with 6am prayer meetings. But this gave the opportunity to start each day on a good note, meet even more people (like Matt “Mrs” Slocum), and to figure out some of why I always click so well with friends like Stacia Misner.

    Even the time I spent moving from one spot to another was a chance to see people and grow friendships deeper. I think of the night I was heading back to my hotel after the walk back from one of the parties had gone past the other hotels, and I bumped into the “Damn Strates” (Jason & Sarah), only to spend an hour talking with them about a number of different things. Or the time spent talking with one of the industry’s newest and brightest lights Jes Borland, who thinks she can out-talk me. Or the time with fellow board members JRJ, MarkS & Lara, which is time I can’t help but enjoy as the conversations switch over and over from flippant to serious and back again.

    Oh, and the lowlight was losing my prescription sunglasses. That’s a real pain.

    The SQL community has something very special, that other communities should be jealous of. These people genuinely love each other, and it’s really good to see and to be part of it. I’ve seen the same at SQLBits, but not in many other contexts.

    @rob_farley

    PS: Apologies to everyone for not having links on your names. I wrote this on a plane without connectivity.

  • Supporting Leukemia Research at the Portland Half Marathon

    My company LobsterPot Solutions gives quite a bit to charity. Sometimes it’s working for reduced rates (or even free), but sometimes we give money directly.

    A Seattle-based friend of mine has CML – one of the types of leukemia, and when I found out she was raising money for the Leukemia & Lymphoma Society by running marathons through their Team In Training program, I wanted to help. LobsterPot donated money to her cause, and I had expected that might be the end of the matter.

    Later, I was contacted for my company logo. It seems that the amount we gave was enough to have the logo printed on the backs of shirts. In fact, the LobsterPot logo is going to be worn by 170 people at races all season, including the Portland Half Marathon. As I was going to be within a few hours’ drive of Portland on the day, I decided to check it out for myself.image

    I even entered the event – the Portland Half Marathon 2011.

    I can’t run. I had a back injury in 2003, which has stopped me being able to cope with the impact of running. But I walked it, in a purple shirt provided by the Leukemia & Lymphoma Society with the LobsterPot logo on the back. It took me over three hours to get around the distance, and although walking is good for my back, I was still very sore at the end.

    Walking that course though, all 21km of it, I saw lots of purple shirts with LobsterPot claws on them, the wearers all encouraging each other to stay strong, and to finish. Many of these people, including my friend Yanni, are leukemia survivors, and it was inspiring.

    So now I have a challenge for you. If you are a business owner, then find ways to give back. It doesn’t have to be much, but it should be something. If you don’t know anyone you can sponsor for this type of event, visit Yanni’s fundraising page at http://tinyurl.com/HelpYanni. She’s running a full marathon in January, and has just started raising funds.

    Get yourself over to http://tinyurl.com/HelpYanni and see how to donate to her. Who knows – if you’re a company there could be a hidden marketing value there. Three years ago she was told she had an incurable blood cancer. Just over a week ago, she completed a half-marathon. Isn’t that the kind of spirit you want your company to reward?

    But don’t do it for yourself, do it for Yanni and those like her.

    image

  • I should've looked the other way

    The words for that song I did at the PASS Summit 2011 are as follows. On the Friday, I stopped where the bridge starts. Various recordings of it are making it to YouTube, such as here, where the song starts around 2:20 in.

    I should've looked the other way

    Verse 1:
    My query sucks - it takes too long
    So long I wrote this song
    The plan's not big - it ain't a giant
    And yet I have an angry client
    Performance now has made her weary
    So I've come in to fix her query
    I promise I won't ever fail her
    Say "Trust me, love, I'm from Australia!"

    Chorus:
    I need to find you
    But I don't want to search every row
    My predicate's residual
    My seek just runs too slow
    I thought I'd caught a glimpse of you
    Been searching for all day
    But all along, I'd done it wrong
    I should've looked the other way

    Verse 2:
    A trace is on, I know the reads
    That fetch the bytes the query needs
    There's spooling from a CTE
    They've got recursion needlessly
    I need to dig a little further
    I worry there might be a cursor
    The DBA has the plan_handle
    He says it's not corrupt, he knows Paul Randal!

    Repeat chorus

    Bridge:
    There is an index covering predicates with keys
    But my developer has used inequalities
    There is a range scan
    Hiding truth
    Hiding cost
    Hiding you...

    Repeat chorus x2

    I should've looked the other way

    © 2011 Rob Farley ;)

    @rob_farley

  • Data, Information and Knowledge

    Hopefully my connection is slightly better during today’s keynote than it was during yesterday’s, when “Live Blogging” didn’t really cut it. The PASS staff saw the problem and have resolved it (thanks guys!).

    Quentin Clark has jumped on stage to talk some more about SQL Server 2012, and he started with the expression “Data, Information and Knowledge”. I love this – I see Business Intelligence about extracting information from data, and it’s good to have Microsoft see this priority across the whole SQL platform.

    He’s also talking about the 12 biggest features of SQL Server 2012, which he says has more new features than any release of SQL Server yet.

    1. Required 9s

    • Integration Services as a Server
    • HA for StreamInsight
    • SQL Server AlwaysOn.

    SQL Server has seen uptime as a key component for a long time, but to provide High Availability for StreamInsight is particularly significant. StreamInsight involves being able to consume data at significant rates, being able to run queries against that data while it’s still on the move – before it’s even reached the relational database. High Availability for StreamInsight should be able to better provide strategies to ensure that streaming data need not be lost. Businesses suffer badly when they lose data. SQL Server 2012 should be able to reduce this problem almost completely.

    2. Blazing-Fast Performance

    • Performance Enhancements – RDBMS, SSAS, SSIS
    • ColumnStore Index

    3. Rapid Data Exploration
    4. Managed Self-Service BI

    • Power View + PowerPivot
    • Administration from SharePoint
    • Reporting Alerts

    Yes, “Power View” has a space in it. It’s the new name for Crescent, which is about self-service reporting using a Silverlight experience. I’m all for allowing users to interact with the data in powerful ways, but I’m also concerned about how to manage this. SharePoint seems to continue as the main platform for this, and although I’d love to see the administration of these reports be done inside SQL itself (instead of SharePoint), I get that SharePoint is currently the platform of choice.

    5. Credible, Consistent Data

    • BI Semantic Model
    • Data Quality Services
    • Master Data Services

    I’m not going to comment on this stuff right now – it’s been talked about plenty already, but the enhancements definitely look good. The new stuff around DQS lets you fix up data at a number of extra points and have it pushed back into the underlying warehouse.

    6. Organisational Compliance

    • Expanded Audit – User-defined, Filtering
    • User-defined Server Roles

    This is useful. Audit is one of the massive things, and yet an administrator has always been able to turn it off. With the ability to have user-defined server roles, auditing turns into a much more real option.

    7. Peace of Mind

    • Production-simulated Application Testing
    • System Center Advisor & Management Packs
    • Expanded Support – Premier Mission Critical

    The Replay tools have become Distributed Replay, which is introduces a ton of really good options.

    8. Scalable Data Warehousing
    9. Fast Time to Solution

    • SQL Server Appliances – Optimised and Pre-tuned
    • HW + SW + Support – Just Add Power
    • Choice of Hardware

    Appliances are stepping up. Buying an appliance, adding the network connection and electricity, and it’s ready to accept data in twenty minutes. The number of new options available suggests that the future of hardware buying will be even more focused on the appliance concept.

    10. Extend Any Data, Anywhere

    • Greater Interoperability – new drivers for PHP, Java & Hadoop
    • ODBC Drivers for Linux & Change Data Capture for SSIS & Oracle
    • Beyond Relational: FileTable, 2D Spatial, Semantic Search

    I really like the idea of CDC for Oracle, although I doubt there will be any

    11. Optimised Productivity

    • SQL Server Data Tools (formerly “Juneau”)
    • Unified Across Database & BI
    • Deployment & Targeting Freedom

    12. Scale on Demand

    • AlwaysOn
    • Deployment Across Public & Private
    • Elastic Scale

    And of course, many of these items contribute to make a much more cloud-ready platform. Cloud isn’t for everyone, but Microsoft are certainly making steps to make it a more feasible option.

    And we’re over – hitting publish now.

    @rob_farley

  • Mashing up data

    In the PASS Summit Keynote (Day 1), they’re demonstrating tools to join data from Excel, Marketplace, SQL Azure, using the “Data Explorer”.

    The thing that I’m liking about this is that you can specify that you want to do a lookup or a merge. Both joins, but two different types. I like this because when I write joins in T-SQL queries, I tend to have it in my head whether I’m expecting a Nested Loop (lookup) or a Merge Join. Seems Data Explorer lets me decide that right up front.

    So you can pull data down from all kinds of different places, and publish it back out through the Marketplace. This means you can enhance other data sets, which other people can then enhance further, and this can continue to make some incredible data sources that can be consumed in OData just about anywhere.

  • SQL + Hadoop news

    I’m at the PASS Summit, and announcements are coming out nice and fast. One of the big ones that has just been made is that SQL Server will work much more closely with Apache Hadoop. Amazing stuff, that I think helps people realise that Microsoft is very much about providing platforms, and isn’t trying to stop you from using your other platforms either.

    So in the “coming soon”, there were:

    • Apache Hadoop-based distribution for Windows Server and Windows Azure
    • ODBC Driver and Add-in for Excel, both for Apache Hive
    • JavaScript Framework for Hadoop

    I’m not doing anything with Hadoop myself, but I know plenty of people do, and this is really good news!

  • PASS Summit 2011 – Day 1 Keynote

    I’m a guest blogger again!

    This trip is already several days old for me. I arrived in America on Friday, delivered two sessions at Portland SQL Saturday event, walked (on Sunday) the Portland Half Marathon as a sponsor of Leukemia and Lymphoma Society’s Team in Training program, delivered a full-day summit precon seminar on Monday, and was in PASS Board meetings yesterday. Finally we’ve reached Day 1, and I could use a weekend!

    But it’s now that things really kick off properly.

    As I write this, Rushabh Mehta is on stage, talking about some of the numbers that apply to this summit. There are over attendees at the Summit this year, demonstrating the amazing reach that the PASS Summit has. And they also just talked about my involvement as an advisor in reaching the rest of the world better too. Every year, the Summit gets bigger, and PASS runs more events in more places. It’s an exciting time to be part of this community.

    Microsoft’s Ted Kummert (Senior VP, Business Platform Division) is on stage now, and has announced that SQL Server Denali will officially be named SQL Server 2012.

  • Joins without JOIN

    I’m now doing two sessions at the SQL Saturday event in Portland. I had been scheduled to do a single session (on indexes), but got an email yesterday asking if I could do another one as well. So now I’m going to do a session earlier in the day about Joins.

    Yes, JOINs. Nice co-incidence to find that this month’s T-SQL Tuesday, hosted by Stuart Ainsworth and a week early because of the PASS Summit, is on that same topic. Plus the fact that I gave a presentation on it to the Adelaide SQL User Group last week.

    So let’s jump in…

    There are three types of joins that we write with the JOIN keyword – INNER, OUTER and CROSS. I’m imagine if you’ve read this far through this post, you’re very much aware of that. But there are other types of joins as well, that don’t use the JOIN keyword (and I’m not counting the comma short-hand for CROSS JOIN that we all used back in the 90s).

    A join describes the way that two sets are related to each other within a query (and for the purposes of this post, I’m not going to count joins that concatenate two sets, such as using the UNION keyword). If you’re using the JOIN keyword, the relationship between the tables is shown in the ON clause – although if you’re using CROSS JOIN, the relationship is that every row in one set is related to every row in the other set.

    If we don’t use the JOIN keyword, then where do we see joins? Well, in correlated sub-queries. I showed this when demonstrating that the APPLY keyword applies a join between one set an another in another recent blog post.

    But every correlated sub-query, whether using APPLY or not must perform a join.

    Consider the following:

    SELECT *
        , (SELECT COUNT(*) FROM Production.Product AS p
           WHERE p.ProductSubcategoryID = s.ProductSubcategoryID) AS ProdCount
    FROM Production.ProductSubcategory AS s;

    There’s a join here, between Production.ProductSubcategory and Production.Product. The ON clause equivalent is the WHERE clause of the correlated sub-query. But what kind of join is it?

    Let’s think... There’s clearly a relationship between the two tables, so it’s not a CROSS JOIN. Also, there’s nothing that would stop a row from the ProductSubcategory table to be returned, so it’s not an INNER JOIN (which only returns rows that match). It’s an OUTER JOIN. And we can prove this by looking at the execution plan.

    The plan says “Right Outer Join”. Don’t be upset by the fact that it’s a Right Join rather than a Left Join – it’s just that Product table is coming from the top row and ProductSubcategory is from the bottom row. The ‘side’ of an Outer Join is just a matter of perspective. A Right Join is no different to a Left Join, it’s simply a question of which side of the desk you’re on.

    Another type of join with JOIN uses EXISTS or IN. These two act very similarly, and I’m going to use EXISTS in my examples.

    Consider the following query:

    SELECT *
    FROM Production.ProductSubcategory AS s
    WHERE EXISTS
        (SELECT * FROM Production.Product AS p
         WHERE p.ProductSubcategoryID = s.ProductSubcategoryID);

    and it’s converse:

    SELECT *
    FROM Production.ProductSubcategory AS s
    WHERE NOT EXISTS
        (SELECT * FROM Production.Product AS p
         WHERE p.ProductSubcategoryID = s.ProductSubcategoryID);

    Just as before, this performs a join between the two tables. But it’s not actually an Outer Join we see going on here. After all, we see that rows can be filtered out of the ProductSubcategory table. Furthermore, we don’t have access to any information in the Product table, not even the count of rows. But it does Filter. This is neither an Inner Join or an Outer Join. The EXISTS form is a Semi Join, and the NOT EXISTS form is an Anti Semi Join. These operations simply filter one set based on whether a match is present or not. We see this operation in the two plans involved.

    The fact that the Query Optimizer chose to implement one with a Hash Match and one with a Nested Loop is irrelevant. We’re looking at the Logical aspects of the plan, not the Physical.

    And so we see there are plenty of times that a Join can appear without the JOIN keyword.

    But let me show you something interesting about the Anti Semi Join for a moment...

    Many people don’t like NOT EXISTS, and would rather write my earlier query like this:

    SELECT *
    FROM Production.ProductSubcategory AS s
    LEFT OUTER JOIN Production.Product AS p
    ON p.ProductSubcategoryID = s.ProductSubcategoryID
    WHERE p.ProductID IS NULL;

    It’s relatively easy to show that these two queries should be identical. Both of them find rows where a match doesn’t exist. Both perform a logical Anti Semi Join. And yet many people will tell you that doing an Outer Join plus a WHERE clause is better – from a performance perspective. They will generally tell you that NOT EXISTS is logically equivalent.

    We’ve seen plenty of times when the Query Optimizer will realise that two queries are identical and produce the same plan, but this doesn’t seem to be the case here.

    This plan shows very much what we asked for. An Outer Join plus a Filter, as opposed to the Anti Semi Join.

    And this has very little reason to be any faster than an Anti Semi Join operation. In fact, the interesting thing here is that the Outer Join completes. It tracks every successful match and passes that up to the Filter. Our Anti Semi Join operation doesn’t do that.

    I see no reason to use an Outer Join plus Filter over NOT EXISTS. It may have been quicker at some point if the Anti Semi Join operation was done poorly in earlier versions, but I’d really like to hear if anyone can show me the Outer Join plus Filter method being quicker.

    PS: This Saturday (Oct 8th) I’ll be presenting this kind of stuff in my “Deeper JOINs” session at SQL Saturday #92 in Portland, and in my pre-conference seminar a couple of days later (Monday 10th) in Seattle at the PASS Summit. You can still register for these events, so if you find this kind of thing interesting, get yourself over to the Pacific North West and I’ll see you there!

    @rob_farley

  • SQL Server MVP Deep Dives Vol 2 – get my chapter free

    Yes, the SQL book I was involved in two years ago has a sequel.

    In the first book, I wrote chapters 7 and 40, and was one of 53 MVPs that contributed. And the proceeds went to War Child. Yes, none of the MVPs who wrote, edited, etc, got any royalties – that all went to charity.

    This time, I wrote just one chapter (chapter 2), but there have been at least seventy MVPs involved. Sixty chapters all by different MVPs, with more MVPs involved as section editors, technical editors and more. And the money this time goes to Operation Smile, another international children’s charity.

    It’s being published in time for the PASS Summit (which is now less than two weeks away), and there will be a book signing for people who have their copies already. You can pre-order through the Manning website at http://www.manning.com/delaney/, but if you do this, you might not have it for the signing. With most of the authors present, you might prefer to try your luck at picking up a copy at the Summit if you can make it.

    If you’re not going to be at the Summit, then notice that you can pre-order the book and that this gets you an Early Access Edition – an electronic copy of some of the chapters as they become available.

    But my chapter is different again – you can download it for FREE. No strings attached. You’ll see a link on the page to my chapter, and can enjoy it straight away, without having to pay anything at all.

    I’m sure I will have missed someone in this list, but I have to provide it because it’s just so extraordinary. The list of MVPs I know were involved includes:

    Johan Ahlen, Gogula Aryalingam, Glenn Berry, Aaron Bertrand, Kevin Boles, Robert Cain, Tim Chapman, Denny Cherry, Michael Coles, Rod Colledge, John Paul Cook, Louis Davidson, Kalen Delaney, Dave Dustin, Rob Farley, Grant Fritchey, Denis Gobo, Darren Gosbell, Sergio Govoni, Allan Hirt, Victor Isakov, Satya Shyam K Jayanty, Tibor Karaszi, Jungsun Kim, Tobiasz Koprowski, Hugo Kornelis, Ted Krueger, Matija Lah, Rodney Landrum, Greg Larsen, Peter Larsson, Andy Leonard, Ami Levin, Greg Low, John Magnabosco, Jennifer McCown, Brad McGehee, Siddharth Mehta, Ben Miller, Allan Mitchell, Tim Mitchell, Luciano Moreira, Jessica Moss, Aaron Nelson, Paul Nielsen, Shahriar Nikkhah, Robert Pearl, Boyan Penev, Pedro Perfeito, Pawel Potasinski, Mladen Prajdic, Abolfazl Radgoudarzi, Paul Randal, Denis Reznik, Rafael Salas, Edwin Sarmiento, Chris Shaw, Gail Shaw, Linchi Shea, Jen Stirrup, Jason Strate, Kimberly Tripp, Paul Turley, Bill Vaughn, Mike Walsh, Peter Ward, Joe Webb, John Welch, Allen White and Thiago Zavaschi.

    Most are authors, some (like Aaron Nelson and Jen Stirrup) were technical editors, and some (like Paul Randal and Kimberly Tripp) were section editors. And Kalen Delaney was the overall editor and responsible for everything.

    So a big thanks to Manning to make this possible, and to Kalen Delaney for leading the effort.

    To potential buyers, see it as a series of short stories, and an opportunity to see a little into the minds of many of the industry’s leading lights.

    And buy a copy – it’s for charity!

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement