THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Peter Larsson

Thinking outside the box

  • Do people want help? I mean, real help?


    Or do they just want to continue with their old habits?

    The reason for this blog post is that I the last week have tried to help people on several forums. Most of them just want to know how to solve their current problem and there is no harm in that. But when I recognize the same poster the very next day with a similar problem I ask myself; Did I really help him or her at all?

    All I did was probably to help the poster keep his or her job. It sound harsh, but is probably true. Why would the poster else continue in the old habit? The most convincing post was about someone wanted to use SP_DBOPTIONS. He had an ugly procedure which used dynamic sql and other things done wrong.

    I wrote to him he should stop using SP_DBOPTION because that procedure have been marked for deprecation and will not work on a SQL Server version after 2008R2, and that he should start using DATABASEPROPERTYEX() function instead.
    His response was basically “Thanks, but no thanks”. Then some other MVP jumped in and gave him a solution using SP_DBOPTIONS and the original poster once again was a happy camper.

    Another problem was posted by someone who wanted a unique sequence number like “T000001” to “T999999”. I suggested him to use a normal IDENTITY column and add a computed column and concatenate the “T” with the value from the IDENTITY column. Even if other people several times proposed my suggestion as an answer, the original poster (OP) unproposed my suggestion! Why?

    The only reason I can think of, is that OP is not used to (or even heard of) computed columns. Some other guy posted and insinuated that computed columns don’t work on SQL Server 2000 and earlier. To that I just posted that computed columns did in fact work already back in SQL Server 7.

    Are people so stuck in their old habit and inept to change for whatever reason that might be? Could it be they are not qualified, or lack enough experience, for their current position? Do they lack basic education about relational databases?

    My question to you is, how do you really help people with these mindsets?

  • Relational division

    I came across an interesting post on Microsoft SQL Server forum this afternoon. It was a question about Relational algebra and the poster wanted to have an efficient query to solve his problem. The original title was "Select parent ids that have exact match on child values (no more, no less)".

    The problem could be solved with relational division, but there is no such operator in SQL Server. Maybe there will be some day.

    But for now there is no such operator, so we as developers have to find our own ways.
    First prepare and populate some sample data


    -- Prepare sample data
    DECLARE @Sample TABLE
            (
                ParentID INT NOT NULL,
                Keyword VARCHAR(25) NOT NULL,
                UNIQUE (ParentID, Keyword)
            )

    -- Populate sample data
    INSERT  @Sample
            (
                ParentID,
                Keyword
            )
    VALUES  (1, 'one'),
            (1, 'two'),
            (1, 'three'),
            (1, 'four'),
            (2, 'one'),
            (2, 'two'),
            (2, 'three'),
            (3, 'one'),
            (3, 'two')


    People had already been active and posted some solutions, of which this common query was present.


    SELECT      s.ParentID
    FROM        @Sample AS s
    WHERE       s.Keyword IN ('one', 'two', 'three')
    GROUP BY    s.ParentID
    HAVING      COUNT(DISTINCT s.Keyword) = 3
                AND COUNT(DISTINCT s.Keyword) = (SELECT COUNT(*) FROM @Sample AS x WHERE x.ParentID = s.ParentID)


    and this type of query


    SELECT      s.ParentID
    FROM        @Sample AS s
    WHERE       s.Keyword IN ('one', 'two', 'three') 
                AND NOT EXISTS (
                                SELECT  *
                                FROM    @Sample AS x
                                WHERE   x.ParentID = s.ParentID
                                        AND x.Keyword NOT IN ('one', 'two', 'three')
                               )
    GROUP BY    s.ParentID
    HAVING      COUNT(DISTINCT s.Keyword) = 3


    The good thing is that both produce the same wanted result but the bad thing is the inefficient execution plans.
    Then one poster did his homework and read about Mr Celko and translated his algorithm to the current problem, and then the query looked like this


    SELECT      ParentID
    FROM        (
                    SELECT  ParentID,
                            Keyword,
                            COUNT(*) OVER (PARTITION BY ParentID) AS cnt
                    FROM    @Sample
                ) AS w
    WHERE       Keyword IN ('one', 'two', 'three')
    GROUP BY    ParentID
    HAVING      MIN(cnt) = 3


    Well, let's just say the execution plan is the worst of them all. The query does produce the correct result.
    With a little different angle, you get a better plan for Mr Celkos query.


    SELECT
          s.ParentID
    FROM        (
                    SELECT      ParentID,
                                COUNT(*) AS cnt
                    FROM        @Sample
                    GROUP BY    ParentID
                ) AS w
    INNER JOIN  @Sample AS s ON s.ParentID = w.ParentID
    WHERE       s.Keyword IN ('one', 'two', 'three')
    GROUP BY    s.ParentID
    HAVING      MIN(w.cnt) = 3


    With these queries in mind, I thought about the problem and realized the problem did in fact have a much simpler solution.
    The query I came up with is the simplest of them all, and just does one pass of the source table. Yes, only one pass just as the first Celko query for relational division, but without the internal worktable.
    This is the query I came up with


    -- Peso
    SELECT      ParentID
    FROM        @Sample
    GROUP BY    ParentID
    HAVING      MIN(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 1
                AND SUM(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 3


    How does the query work? The second aggregation filtering just makes sure all three keywords are present.
    But the first aggregation filter? What does it do? To simplify, I just write that it takes care of the modulo part of the relational division. There cannot be a "fractional" part of the relational division, because it means that particular ParentID has more keywords than wanted.

    Simple as that.

    //Peso


    PS. These are the textual execution plans for the four types of queries and then mine.


      |--Filter(WHERE:([Expr1003]=CASE WHEN [Expr1007] IS NULL THEN (0) ELSE [Expr1007] END))
           |--Nested Loops(Left Outer Join, OUTER REFERENCES:([s].[ParentID]))
                |--Filter(WHERE:([Expr1003]=(3)))
                |    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0)))
                |         |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1014]=Count(*)))
                |              |--Index Scan(OBJECT:(@Sample AS [s]),  WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD)
                |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0)))
                     |--Stream Aggregate(DEFINE:([Expr1015]=Count(*)))
                          |--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]) ORDERED FORWARD)


      |--Filter(WHERE:([Expr1007]=(3)))
           |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1010],0)))
                |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1010]=Count(*)))
                     |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([s].[ParentID]))
                          |--Index Scan(OBJECT:(@Sample AS [s]),  WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD)
                          |--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]),  WHERE:(@Sample.[Keyword] as [x].[Keyword]<>'one' AND @Sample.[Keyword] as [x].[Keyword]<>'three' AND @Sample.[Keyword] as [x].[Keyword]<>'two') ORDERED FORWARD)


      |--Filter(WHERE:([Expr1005]=(3)))
           |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1005]=MIN([Expr1004])))
                |--Filter(WHERE:([Keyword]='one' OR [Keyword]='three' OR [Keyword]='two'))
                     |--Nested Loops(Inner Join)
                          |--Table Spool
                          |    |--Segment
                          |         |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)
                          |--Nested Loops(Inner Join, WHERE:((1)))
                               |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1008],0)))
                               |    |--Stream Aggregate(DEFINE:([Expr1008]=Count(*)))
                               |         |--Table Spool
                               |--Table Spool


      |--Filter(WHERE:([Expr1008]=(3)))
           |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1008]=MIN([Expr1004])))
                |--Nested Loops(Inner Join, OUTER REFERENCES:([ParentID]))
                     |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1012],0)))
                     |    |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1012]=Count(*)))
                     |         |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)
                     |--Index Seek(OBJECT:(@Sample AS [s]), SEEK:([s].[ParentID]=[ParentID] AND [s].[Keyword]='one' OR [s].[ParentID]=[ParentID] AND [s].[Keyword]='three' OR [s].[ParentID]=[ParentID] AND [s].[Keyword]='two') ORDERED FORWARD)


      |--Filter(WHERE:([Expr1004]=(1) AND [Expr1005]=(3)))
           |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1004]=MIN([Expr1006]), [Expr1005]=SUM([Expr1006])))
                |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Keyword]='three' OR [Keyword]='two' OR [Keyword]='one' THEN (1) ELSE (0) END))
                     |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)


  • Another bin-packaging algorithm using recursion and XML

    This time I will show you an algorithm to do the dreaded bin-packaging using recursion and XML.
    First, create some sample data like this

    -- Prepare sample data
    DECLARE @Sample TABLE
            (
                RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                Expense SMALLMONEY NOT NULL
            )
     
    -- Populate sample data
    INSERT  @Sample
            (
                Expense
            )
    VALUES  (12.51),
            (45.63),
            (66.35),
            (92.66),
            (65.46),
            (54.01),
            (32.23),
            (27.16),
            (78.92),
            (14.58)
     
    Next, we need to create a variable to hold the user's wanted total sum.

    -- Prepare user supplied parameter
    DECLARE @WantedSUM SMALLMONEY = 111.09

    And we also need to create a temporary staging table to hold the valid combinations
     
    -- Prepare temporary staging table
    DECLARE @Temp TABLE
            (
                CombID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                TrackPath XML NOT NULL
            )

    Now we only have to do the calculations!
    Here I am using a special trick to get the unique combination, since the path of records 1>2>3 is the same as 1>3>2, 2>1>3, 2>3>1, 3>1>2 and 3>2>1. See explanation between combination and permutation at Wikipedia here.
    To keep track of which records I already have used in the total sum, I simply remove the record id (RowID) from the Hits list.
    And, to give the correct answer at the end, I build a XML string with visited RowID's building up the correct sum.
     
    -- Calculate all possible permutations using recursion
    ;WITH ctePack(Total, Hits, TrackPath)
    AS (
            SELECT  s.Expense AS Total,
                    (SELECT '#' + CAST(x.RowID AS VARCHAR(MAX)) FROM @Sample AS x WHERE x.RowID <> s.RowID ORDER BY x.RowID FOR XML PATH('')) + '#' AS Hits,
                    '<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
            FROM    @Sample AS s
            WHERE   s.Expense <= @WantedSum
     
            UNION ALL
     
            SELECT      p.Total + s.Expense,
                        REPLACE(p.Hits, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Hits,
                        p.TrackPath + '<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
            FROM        @Sample AS s
            INNER JOIN  ctePack AS p ON p.Hits LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%'
            WHERE       p.Total + s.Expense <= @WantedSum
    )
    INSERT      @Temp
                (
                    TrackPath
                )
    SELECT      MIN(TrackPath)
    FROM        ctePack
    WHERE       Total = @WantedSum
    GROUP BY    Hits

    When the iterations are over, and we have the wanted combniation(s), the task left is to report the records giving us the correct sum.
    We also need the records grouped so that we can see which group each expense belong to. In same cases, one and the same record may used in multiple groups.
     
    -- Display the final resultset
    SELECT      t.CombID,
                s.RowID,
                s.Expense
    FROM        @Temp AS t
    CROSS APPLY t.TrackPath.nodes('/ID') AS f(n)
    INNER JOIN  @Sample AS s ON s.RowID = f.n.value('.', 'INT')
    ORDER BY    t.CombID,
                s.RowID


    If you don't want to use XML, you can write the recursive cte like this (and drop the @temp table), to get all included records directly.


    -- Calculate all possible permutations using recursion
    ;WITH ctePack(RowID, Expense, Total, Tracker)
    AS (
            SELECT  s.RowID,
                    s.Expense,
                    s.Expense AS Total,
                    (
                        SELECT      '#' + CAST(x.RowID AS VARCHAR(MAX))
                        FROM        @Sample AS x
                        WHERE       x.RowID <> s.RowID
                        ORDER BY    x.RowID
                        FOR XML     PATH('')
                    ) + '#' AS Tracker
            FROM    @Sample AS s
            WHERE   s.Expense <= @WantedSum

            UNION ALL

            SELECT      s.RowID,
                        s.Expense,
                        p.Total + s.Expense,
                        REPLACE(p.Tracker, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Tracker
            FROM        @Sample AS s
            INNER JOIN  ctePack AS p ON p.Tracker LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%'
            WHERE       p.Total + s.Expense <= @WantedSum
    )
    SELECT  DISTINCT
            DENSE_RANK() OVER (ORDER BY Tracker) AS CombID,
            RowID,       
            Expense
    FROM    ctePack
    WHERE   Total = @WantedSum
  • Ten days left

    As some of you know, I was awarded Microsoft Most Valuable Professional (MVP) for SQL Server in July last year. Now it's time to see if I get my MVP renewed, or if I lose the award.
    I honestly believe it's a good thing MVP status only lasts for one year at a time. Knowledge is fresh. Things that worked in the past may not work any longer due to evolvement, and there are smarter ways to do things now, than before.

    Being an MVP is a responsibility. It does mean you have more than average knowledge of SQL Server and how to work out the best from the product. It doesn't mean you know everything about the product. Noone does.
    Being an MVP means you have benefit of the community of your peers. I have learned more hardcore things only the last year, than I knew from the previous five years. And sometimes I get lucky to contribute to the MVP community myself.

    That's how things work. Me being around knowledgeble people makes me learn, or understand, same things they do. The same thing apply to beginners on the forums. They most often look up to MVP's and learn one or two things from them.

    The last year I haven't been around the forum as much as I want to. The reason is that I started (together with SQL server MVP Thomas Ivarsson) a PASS chapter in Sweden. It has taken a lot of time in the beginning and still does take some time to plan and invite interesting guest speakers for our meetings.

    We are currently working to get a full day event in place in Sweden on the 10th of September. That's 3 months away but it feels there is not enough time! I guess every planner feels that way.
    However, this is in my "line of duty", to continue to help and educate the SQL Server community,

    Hopefully I get my MVP status for another year. It means my peers have faith in me and my ability to support the community.
    I would like to thank a few people for the collaboration last year, which problably wouldn't have happened without my MVP status:

    1) Itzik Ben-Gan for incredible insight and the email correspondence we shared the last year.
    2) Joe Celko, for some fun competitions and coding help. I do love to see my [weighted] moving average in your upcoming SQL for smarties book.
    3) Phil Factor, who have thought me a lot of driving and encouraging people to participate. And of course, some Speed Phreakery!
    4) Tony Davis, who gave my the opportunity to be a technical reviewer.

    There are a lot of other people whom I have shared ideas with. You know who you are.

    I know a few people who probably will receive the MVP award this quarter. I wish you good luck and god speed!

    //Peter

  • The impossible thing happened...

    Yesterday I got the brilliant idea to format my laptop and reinstall Windows 7 from scratch.
    I've had a few problem last month, mostly due to upgrading the preinstalled Vista to Windows 7.

    Said and done, I boldly finished the installation and thought everything was ok. I mean, my email is IMAP so all email are stored at my ISP, all my relevant documents are on another partition on the harddrive and so on...

    But..! Now I have noticed I forgot to save my Favorites Links.
    So now I need your help. I can try to remember all my favorites (about 80 of them), or I can pledge to you to post interesting SQL-related blogs to me, so I can quickly check them and add them to my Favorites again.


    Thank you

    //Peter
  • Invitation to startup meeting for PASS Scania

    PASS Scania is a new PASS chapter in Sweden which will promote the interest, networking and knowledge for professional SQL Server-users such as developers, DBA's and BI-specialist in Skåne. The Chapter also has a business alliance with an existing usergroup SQLUG.
     
    All developers who use SQL Server platform in their work is welcome to participate in the seminars with start february 4th 2010. Both consultants and employees are welcome.
     
    Membership and seminars are free and activities are mainly sponsored events.
     
    First meeting is held in feburary where new members have the opportunity to present themself, we will present the agenda for 2010 and we will also demonstrate the new Microsoft PowerPivot platform.
    Further seminars will include how to read and interpret execution plans, how to decide for an index strategy and query performance considerations.
     
    Welcome to PASS Scania first event on the 4th of February 2010 at S:t Gertruds in Malmö at 18:00. More about agenda is found on www.pass-scania.se where you also register. Number of seats are limited and are restricted to sequence of registrations."
     
    //Peter
  • Convert binary value to string value

    With SQL Server 2008, we can easily use

    DECLARE @bin VARBINARY(MAX)
    SET     @bin = 0x5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8

    SELECT  CONVERT(VARCHAR(MAX), @bin, 2)


    But how can we do this in SQL Server 2005?
    You can't use a simple CAST or CONVERT, because you get the ASCII representation of the binary values.

    So, here is how you can do the conversion with SQL Server 2005 by using XML.

    -- Prepare value
    DECLARE @bin VARBINARY(MAX)
    SET     @bin = 0x5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8

    -- Display the results
    SELECT @bin AS OriginalValue,
            CAST('' AS XML).value('xs:hexBinary(sql:variable("@bin"))', 'VARCHAR(MAX)') AS ConvertedString

  • Problem with SQL Server service restart

    The few last days, our hosting company have  updated their VMware environment a number of times and thus have forced equal number of restarts for our database servers.
    The problem with this is that one of the databases, Yoda, needed 1.5-2.0 hours to start up due to "In Recovery" status.

    I asked around what could be the cause of this and also read some excellent posts by Kimberley Tripp
    http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx
    http://www.sqlskills.com/BLOGS/KIMBERLY/post/8-Steps-to-better-Transaction-Log-throughput.aspx

    I saw the undocumented DBCC LogInfo command and decided to give it a go. To my surprise, there were 33,636 records returned. As a general rule of thumb, you should have about 8-16 VLF per new chunk of log file, when increased.
    My log file was only 24 GB, so I expected a VLF number of about 25-50. So 33,636 is quite a difference!

    So I executed this script about 20 times, and saw the number of VLF lower.

    CHECKPOINT
    GO

    DBCC SHRINKFILE(logYoda, TRUNCATEONLY)
    GO

    ALTER DATABASE Yoda
    MODIFY FILE
    (
          NAME = logYoda
        , SIZE = 8192
    )
    GO

    DBCC LOGINFO
    GO


    Finally, it seemed that the number of VLF's leveled out at 19. And now the "In Recovery" time has as fast as the other databases on the server. Next step is to deal with the number of VLF's for a database which is 1GB in size (log is 1 GB) and have 859 VLF's (expected is 8-16).

    And finally, this is a script that present the number of VLF's per database.

    EXEC master.dbo.sp_msforeachdb 'USE [?]; SELECT ''?'' AS [Database];DBCC loginfo'

  • T-SQL Tuesday #001: Exploring "Fuzzy" Interval Islands Without Using SQLCLR

    In response to Adam's new series of T-SQL Tuesday, I wanted to write that there are faster ways to get the data in a set-based manner without resorting to SQLCLR.
    http://sqlblog.com/blogs/adam_machanic/archive/2009/12/08/t-sql-tuesday-001-exploring-fuzzy-interval-islands-using-sqlclr.aspx

    Many companies today still think of SQLCLR's as security risks, so I wanted to rewrite Adam's set-based code to a more efficient algorithm.
    On my computer, Adam's code run in about 25 seconds. These two new algorithms below, runs in less than a second each. Which one to choose is how you interpret the definition Adam wrote. Also, this is not a blog post to make someone feel bad, it is a blog post to show there are other ways to write code, with better performance. You just have to think a little different. If someone is interested in how this algorithm works, please ask nice and I'll probably write a more in-depth blog post about the algorithm.
    Complaining in the comments doesn't work.


    Take 1:

    DECLARE
     @Interval INT = 7

    ;WITH cteSource
    AS (
        SELECT  ProductID,
                TransactionDate,
                ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID
        FROM    Production.TransactionHistory
    ), cteMatch
    AS (
        SELECT      s1.ProductID,
                    s1.TransactionDate AS FromDate,
                    s2.TransactionDate AS ToDate
        FROM        cteSource AS s1
        INNER JOIN  cteSource AS s2 ON s2.ProductID = s1.ProductID
        WHERE       s1.recID = s2.recID - 1
    ), cteYak
    AS (
            SELECT      ProductID,
                        FromDate,
                        ToDate,
                        y - ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS grp
            FROM        (
                            SELECT  ProductID,
                                    FromDate,
                                    ToDate,             
                                    CASE
                                        WHEN DATEDIFF(DAY, FromDate, ToDate) <= @Interval
                                            THEN ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate)
                                        ELSE NULL
                                    END AS y
                            FROM    cteMatch
                        ) AS d
            WHERE       y IS NOT NULL
    )
    SELECT      ProductID,
                MIN(FromDate) AS FromDate,
                MAX(ToDate) AS ToDate
    FROM        cteYak
    WHERE       DATEDIFF(DAY, FromDate, ToDate) <= @Interval
    GROUP BY    ProductID,
                grp
    ORDER BY    ProductID,
                MIN(FromDate)


    This runs in less than a second.

    Take2:

    DECLARE @Interval INT = 7

    ;WITH cteSource
    AS (
        SELECT  ProductID,
                TransactionDate,
                ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID
        FROM    Production.TransactionHistory
    ), cteMatch
    AS (
        SELECT      s1.ProductID,
                    s1.TransactionDate AS FromDate,
                    s2.TransactionDate AS ToDate
        FROM        cteSource AS s1
        INNER JOIN  cteSource AS s2 ON s2.ProductID = s1.ProductID
        WHERE       s1.recID = s2.recID - 1
    ), cteYak
    AS (
        SELECT      ProductID,
                    FromDate,
                    ToDate,             
                    CASE
                        WHEN DATEDIFF(DAY, FromDate, ToDate) <= @Interval THEN 1
                        ELSE -1
                    END * ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS recID
        FROM        cteMatch
    ), cteFinal
    AS (
        SELECT      ProductID,
                    MIN(FromDate) AS StartDate,
                    DATEADD(DAY, @Interval, MAX(ToDate)) AS EndDate,
                    1 + MAX(recID) AS recID
        FROM        (
                        SELECT  ProductID,
                                FromDate,
                                ToDate,
                                recID - ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS grp,
                                recID
                        FROM    cteYak
                        WHERE   recID > 0
                    ) AS d
        GROUP BY    ProductID,
                    grp
     
        UNION ALL
     
        SELECT      ProductID,
                    FromDate AS StartDate,
                    DATEADD(DAY, @Interval, FromDate) AS EndDate,
                    ABS(recID) AS recID
        FROM        cteYak
        WHERE       recID < 0
    )

    SELECT      ProductID,
                MIN(StartDate) AS StartDate,
                MAX(EndDate) AS EndDate
    FROM        cteFinal
    GROUP BY    ProductID,
                recID


    This also runs in less than a second. And produces the exact same result as Adam's code.

    //Peter

  • Lesson learned by Trial and Error

    Yesterday, I decided to install Microsoft SQL Server 2008 R2 (November CTP) on my local machine. I already had SQL Server Express 2008 installed so the first step was to uninstall Express, since there seemed to be no upgrade path.
    Uninstalling went great.
    I then installed 2008 R2 and rebooted my machine. I attached my sample databases, and my test database. To my surprise, the Test database was now corrupted!

    File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test.ldf"
    may be incorrect. The log cannot be rebuilt because there were open transactions/users when the database was shutdown, no checkpoint
    occurred to the database, or the database was read-only. This error could occur if the transaction log file was manually deleted or lost
    due to a hardware or environment failure.

    I tried sp_attach_single_file_db to no vail. I tried all methods known by either Google or some of the MVP's I emailed for help. All of them said the only way was to restore the database. It was no big deal, because it is in fact a test database.

    Well, I had some time left so I decided to go Trial and Error!

    Since I couldn't attach the database file at all, the first step had to be to create a new database named Test with same size. I decided to create the logfile at the same location as the original file. The datafile I placed at the new location.

    CREATE DATABASE Test
    ON PRIMARY
    ( NAME = Test,
        FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\Test.mdf',
        SIZE = 1024,
        MAXSIZE = 2048,
        FILEGROWTH = 15% )
    LOG ON
    ( NAME = Test_log,
        FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test_log.ldf',
        SIZE = 128MB,
        MAXSIZE = 2048MB,
        FILEGROWTH = 5MB ) ;
    GO

    I thought it didn't matter what size the log file was, so I just used a small number, however I thought it was important to have the datafile at the same size as the corrupt one.
    I then stopped SQL Server service, replaced the empty datafile with the corrupt datafile and started SQL Server service again. The database Test now was visible in the databases, but of course it was not accesible. A number of different error messages was returned to be depending on how I tried to access the database.

    Ad-hoc updates to system tables are not allowed with SQL Server 20008 and later (even if using sp_configure) so I issued

    ALTER DATABASE Test SET EMERGENCY

    This is the same as the previous "UPDATE SET = -32768" trick.
    I then decided to set the database in single user mode with

    ALTER
    DATABASE Test SET SINGLE_USER

    Now the work by checking the database was going to happen! Since it is a test database, I didn't care if there was dataloss, so I went for

    DBCC CHECKDB (Test, REPAIR_ALLOW_DATA_LOSS)

    It took some 2 minutes to complete, and there were errors and warnings. I rerun the CHECKDB command again, and this time where were only informational messages.
    After that, I set the database in multi user mode and put the database online again with

    ALTER DATABASE Test SET MULTI_USER
    ALTER DATABASE Test SET ONLINE

    And now I have access to my database again. It also seem there was no dataloss at all.

  • Simple Fibonacci calculation

    I have listened to the critique in the comments and removed the advanced version where you can calculate virtually any Fibonacci number. Contact me if you have the need for it again. I won't bother non-interested people with the algorithm here.

    Instead, I am concentrating on how this easy example of how to use recursive CTE for calculating the Fibonacci series is constructed.

    This is done as an easy exercise for students how to implement a recursive CTE, because the results can be confirmed visually.
    Understanding recursive queries is not natural for many people.

    For those of you not familiar with Fibonacci series and wonder what uses does it have, I can tell that Fibonacci series is used in a number of areas

    • Run-time analysis of Euclid's algorithm to determine the greatest common divisor of two integers (the worst case input for this algorithm is a pair of consecutive Fibonacci numbers
    • The Fibonacci numbers and principle is also used in the financial markets. It is used in trading algorithms, applications and strategies.
    • Fibonacci numbers are used by some pseudorandom number generators.
    • Fibonacci numbers are used in a polyphase version of the merge sort algorithm in which an unsorted list is divided into two lists whose lengths correspond to sequential Fibonacci numbers.
    • The Fibonacci cube is an undirected graph with a Fibonacci number of nodes that has been proposed as a network topology for parallel computing.
    • A one-dimensional optimization method, called the Fibonacci search technique, uses Fibonacci numbers.
    • The Fibonacci number series is used for optional lossy compression in the IFF 8SVX audio file format used on Amiga computers

    So how is the Fibonacci series built?

    The matemathical formula is F(n) = F(n-1) + F(n-2), which in plain english is that any Fibonacci number (greater than 2) is the sum of the two previous numbers.
    0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 etc..

    So how do we solve this using a recursive CTE (CTE is an acronym for Common Table Expression which was introduced with Microsof SQL Server 2005)?
    Can we solve it at all? A recursive query is nested only one level, right? And the Fibonacci series is "nested" two level?

    Yes, we can solve it by using a complementary "sideways" translation, a sort of intermediate storage. First thing, all recursive CTE need an anchor part, a fixed part from which the recursion is expanded from. I do this by using the values {0, 1} which per definition are the two first values of the Fibonacci series.

    How is then the recursive part done? I use the two values from the previous iteration and sum them together for the next Fibonacci value. I store that value in a column, shift the columns, and use the previous value in the same recursion level. Easy, huh?

    ;WITH Fibonacci(n, f, f1)
    AS (
            -- This is the anchor part
            -- Initialize level to 1 and set the first two values as per definition
            SELECT  CAST(1 AS BIGINT),
                    CAST(0 AS BIGINT),
                    CAST(1 AS BIGINT)

            UNION ALL

            -- This is the recursive part
            -- Calculate the next Fibonacci value using the previous two values
            -- Shift column (place) for the sum in order to accomodate the previous
            -- value too because next iteration need them both
            SELECT  n + 1,
                    f + f1,
                    f
            FROM    Fibonacci
            -- Stop at iteration 93 because we than have reached maximum limit
            -- for BIGINT in Microsoft SQL Server
            WHERE   n < 93
    )
    -- Now the easy presentation part
    SELECT  n,
            f AS Number
    FROM    Fibonacci

  • Performance consideration when using a Table Variable

    This is nothing new to me (I come across this autumn 2007) and probably not to you either but I forgot about it and yesterday it came alive again.
    I often choose a table variable for performance reasons (if data is less than 1 page anyway) due to the benefits of no logging etc etc.

    But yesterday I wrote a query for Phil Factor's "Subscription List" competition where I had choosen a table variable for the same reasons as before. It took a while to realize why (in this case) the solution with a temporary table was 30% faster than the solution with a table variable.

    So I started to investigate the issue. SQL Profiler proofed me right over and over again for the 30% performance gain when using a temporary table over a table variable. It just didn't make any sense to me that the performance should differ that much. The INSERT was derived from a million record table, and the final resultset after grouping was just 120 records of 14 bytes each, well below the "one page rule". How come the solution with table variable was so much slower?
    Remember, in this fierce competition we calculated milliseconds for who was in the lead, so 30% was a giant leap.

    I won't bore you with details, but finally it come clear to me that it seems table variables doesn't support parallelism. So I went to Google for searching this issue and the results supported my finding. Insert to a table variables makes parallelism impossible. Why? A table variable is just like any other table (except logging).
    I found this KB article http://support.microsoft.com/default.aspx/kb/305977/EN-US/ (the INSERT ... EXEC is suppoted with SQL Server 2008).
    And this http://msdn.microsoft.com/en-us/library/ms175010(SQL.90).aspx. However it doesn't say why a table doesn't support parallelism, just that doesn't support parallelism.
    Here is a page from the SQL Server Engine Storage Team http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/30/sql-server-table-variable-vs-local-temporary-table.aspx

    Here is a testcase in a million record table.

    -- First with a temporary table
    CREATE TABLE    #Sample
                    (
                        DateJoined SMALLINT NOT NULL,
                        DateLeft SMALLINT,
                        Registrations INT NOT NULL
                    )
     
    INSERT      #Sample
                (
                    DateJoined,
                    DateLeft,
                    Registrations
                )
    SELECT      DATEDIFF(MONTH, 0, DateJoined),
                DATEDIFF(MONTH, 0, DateLeft),
                COUNT(*)
    FROM        dbo.Registrations
    GROUP BY    DATEDIFF(MONTH, 0, DateJoined),
                DATEDIFF(MONTH, 0, DateLeft)
     
    DROP TABLE #Sample
     
    -- Second with a table variable
    DECLARE @Sample TABLE
            (
                DateJoined SMALLINT NOT NULL,
                DateLeft SMALLINT,
                Registrations INT NOT NULL
            )
     
    INSERT     @Sample
                (
                    DateJoined,
                    DateLeft,
                    Registrations
                )
    SELECT      DATEDIFF(MONTH, 0, DateJoined),
                DATEDIFF(MONTH, 0, DateLeft),
                COUNT(*)
    FROM        dbo.Registrations
    GROUP BY    DATEDIFF(MONTH, 0, DateJoined),
                DATEDIFF(MONTH, 0, DateLeft)
     
    -- And then a conventional SELECT
    SELECT      DATEDIFF(MONTH, 0, DateJoined),
                DATEDIFF(MONTH, 0, DateLeft),
                COUNT(*)
    FROM        dbo.Registrations
    GROUP BY    DATEDIFF(MONTH, 0, DateJoined),
                DATEDIFF(MONTH, 0, DateLeft)
     
  • New baby DBA

    Today our son was born. A healthy baby of 4.4 kg (9.8 lbs) and 53 cm (20.9 inches).
    Both mother and son are well, and spend the night at the safety on the hospital.

    Water broke 2:45 am this morning and we were admitted to the ward at 4:21. Our son was born without complications at 5:06 am.
    Big sisters Isabelle (23 months) and Filippa (5.5 yr) are anxiously waiting for mother and new baby to get home.

    //Peter

  • Date and Time concurrency

    In the past I mostly have given the advice to break down all date intervals into the smallest part (often minutes) and then group by the minute.

    -- Prepare sample data
    DECLARE  @Data TABLE
             (
                 RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                 CreateDate DATETIME,
                 DeleteDate DATETIME
             )

    -- Populate sample data
    INSERT  @Data
            (
                CreateDate,
                DeleteDate
            )
    SELECT  '2009-01-14 22:33', '2009-01-14 22:35' UNION ALL
    SELECT  '2009-01-14 22:33', '2009-01-14 22:33' UNION ALL
    SELECT  '2009-01-14 22:34', '2009-01-14 22:35' UNION ALL
    SELECT  '2009-01-14 22:35', '2009-01-14 22:35' UNION ALL
    SELECT  '2009-01-14 22:35', '2009-01-14 22:36' UNION ALL
    SELECT  '2009-01-14 22:37', '2009-01-14 22:37' UNION ALL
    SELECT  '2009-01-14 22:39', '2009-01-14 22:39' UNION ALL
    SELECT  '2009-01-14 22:38', '2009-01-14 22:38' UNION ALL
    SELECT  '2009-01-14 22:39', '2009-01-14 22:39' UNION ALL
    SELECT  '2009-01-14 22:41', '2009-01-14 22:41' UNION ALL
    SELECT  '2009-01-14 22:43', '2009-01-14 22:44' UNION ALL
    SELECT  '2009-01-14 22:52', '2009-01-14 22:52' UNION ALL
    SELECT  '2009-01-14 22:53', '2009-01-14 22:53' UNION ALL
    SELECT  '2009-01-14 22:53', '2009-01-14 22:53' UNION ALL
    SELECT  '2009-01-14 22:56', '2009-01-14 22:57' UNION ALL
    SELECT  '2009-01-14 22:57', '2009-01-14 22:57' UNION ALL
    SELECT  '2009-01-14 22:58', '2009-01-14 22:58' UNION ALL
    SELECT  '2009-01-14 22:58', '2009-01-14 22:59' UNION ALL
    SELECT  '2009-01-14 22:59', '2009-01-14 22:59'

    DECLARE @From INT,
            @To INT

    SELECT  @From = MIN(DATEDIFF(MINUTE, 0, CreateDate)),
            @To = MAX(DATEDIFF(MINUTE, 0, DeleteDate))
    FROM    @Data

    SELECT      w.theTime,
                COUNT(*)
    FROM        (
                    SELECT  DATEADD(MINUTE, Number + @From, 0) AS theTime
                    FROM    master..spt_values
                    WHERE   Type = 'P'
                            AND Number <= @To - @From
                ) AS w
    INNER JOIN  @Data AS d ON d.CreateDate <= w.theTime
                    AND d.DeleteDate >= w.TheTime
    GROUP BY    w.theTime
    ORDER BY    COUNT(*) DESC,
                w.theTime DESC

    Well, sometimes the time interval is too large to hold all minute values, and what if you all of a sudden decides to break down the calculation into milliseconds?

    The problem resembles about date range searching. Most people tend to use

    SELECT  *
    FROM    Table1
    WHERE   Col1 BETWEEN '20090129 00:00:00.000' AND '20090129 23:59:59.997'

    What you should use is open-ended search criteria as

    SELECT  *
    FROM    Table1
    WHERE   Col1 >= '20090129'
            AND Col1 < '20090130'

    The solution for this type of problem can then look something like this

    -- Prepare sample data
    DECLARE @Data TABLE
            (
                RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                CreateDate DATETIME,
                DeleteDate DATETIME
            )
     
    -- Populate sample data
    INSERT  @Data
            (
                CreateDate,
                DeleteDate
            )
    SELECT  '2009-01-14 22:33:41.857', '2009-01-14 22:35:59.543' UNION ALL
    SELECT  '2009-01-14 22:33:42.857', '2009-01-14 22:33:59.543' UNION ALL
    SELECT  '2009-01-14 22:34:26.513', '2009-01-14 22:35:43.233' UNION ALL
    SELECT  '2009-01-14 22:35:14.920', '2009-01-14 22:35:31.530' UNION ALL
    SELECT  '2009-01-14 22:35:50.373', '2009-01-14 22:36:07.340' UNION ALL
    SELECT  '2009-01-14 22:37:26.793', '2009-01-14 22:37:44.857' UNION ALL
    SELECT  '2009-01-14 22:39:22.077', '2009-01-14 22:39:38.543' UNION ALL
    SELECT  '2009-01-14 22:38:03.873', '2009-01-14 22:38:20.827' UNION ALL
    SELECT  '2009-01-14 22:39:39.247', '2009-01-14 22:39:55.840' UNION ALL
    SELECT  '2009-01-14 22:41:25.857', '2009-01-14 22:41:42.467' UNION ALL
    SELECT  '2009-01-14 22:43:14.607', '2009-01-14 22:44:31.483' UNION ALL
    SELECT  '2009-01-14 22:52:10.233', '2009-01-14 22:52:26.827' UNION ALL
    SELECT  '2009-01-14 22:53:08.187', '2009-01-14 22:53:24.983' UNION ALL
    SELECT  '2009-01-14 22:53:36.483', '2009-01-14 22:53:53.060' UNION ALL
    SELECT  '2009-01-14 22:56:56.403', '2009-01-14 22:57:13.263' UNION ALL
    SELECT  '2009-01-14 22:57:28.247', '2009-01-14 22:57:44.780' UNION ALL
    SELECT  '2009-01-14 22:58:16.090', '2009-01-14 22:58:32.623' UNION ALL
    SELECT  '2009-01-14 22:58:52.137', '2009-01-14 22:59:08.670' UNION ALL
    SELECT  '2009-01-14 22:59:21.170', '2009-01-14 22:59:37.733'
     
    -- Prepare staging table
    DECLARE @Stage TABLE
            (
                RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                FromTime DATETIME NOT NULL,
                ToTime DATETIME NOT NULL
            )
     
    -- Populate staging table
    INSERT      @Stage
                (
                    FromTime,
                    ToTime
                )
    SELECT      u.theTime,
                u.theTime
    FROM        @Data AS d
    UNPIVOT     (
                    theTime
                    FOR theCol IN (d.CreateDate, d.DeleteDate)
                ) AS u
    GROUP BY    u.theTime
    ORDER BY    u.theTime
     
    -- Update with closest range
    UPDATE      s
    SET         s.ToTime = w.FromTime
    FROM        @Stage AS s
    INNER JOIN  @Stage AS w ON w.RecID = s.RecID + 1
     
    -- Delete last time
    DELETE
    FROM    @Stage
    WHERE   RecID = SCOPE_IDENTITY()
     
    -- Display the result
    SELECT      s.FromTime,
                s.ToTime,
                COUNT(*) AS Occurencies
    FROM        @Data AS d
    INNER JOIN  @Stage AS s ON s.FromTime < d.DeleteDate
                    AND s.ToTime > d.CreateDate
    GROUP BY    s.FromTime,
                s.ToTime
    HAVING      COUNT(*) > 1
    ORDER BY    COUNT(*) DESC,
                s.FromTime DESC
    Finally, there is also a twist with this approach. You do not only get for which exact minute there is the most concurrency, you get the whole range!
    And you can also see the gaps, where no concurrency occurs at all.

    -- Display the gaps
    SELECT      s.FromTime,
                s.ToTime,
                COUNT(d.CreateDate) AS Occurencies
    FROM        @Data AS d
    RIGHT JOIN  @Stage AS s ON s.FromTime < d.DeleteDate
                    AND s.ToTime > d.CreateDate
    GROUP BY    s.FromTime,
                s.ToTime
    HAVING      COUNT(d.CreateDate) = 0
    ORDER BY    COUNT(d.CreateDate) DESC,
                s.FromTime

    //Peso
  • Moving average and Weighted Moving Average

    In my previous blog post, I wrote about how to calculate median value and weighted median value in a secure and fast approach.

    In this blog post I am going to describe how you can calculate a fast moving average and also calculate a fast weighted moving average.
    This is the sample data we should work with during the whole exercise. I also display both the normal moving average and the weighted moving average at the same time.

    DECLARE @Sample TABLE
            (
                dt SMALLDATETIME,
                Rate SMALLMONEY
            )

    INSERT  @Sample
    VALUES  (DATEADD(DAY, DATEDIFF(DAY, 10, GETDATE()), 0),  2),
            (DATEADD(DAY, DATEDIFF(DAY,  9, GETDATE()), 0),  2),
            (DATEADD(DAY, DATEDIFF(DAY,  8, GETDATE()), 0),  4),
            (DATEADD(DAY, DATEDIFF(DAY,  7, GETDATE()), 0),  4),
            (DATEADD(DAY, DATEDIFF(DAY,  6, GETDATE()), 0),  5),
            (DATEADD(DAY, DATEDIFF(DAY,  5, GETDATE()), 0),  6),
            (DATEADD(DAY, DATEDIFF(DAY,  4, GETDATE()), 0),  6),
            (DATEADD(DAY, DATEDIFF(DAY,  3, GETDATE()), 0),  8),
            (DATEADD(DAY, DATEDIFF(DAY,  2, GETDATE()), 0),  9),
            (DATEADD(DAY, DATEDIFF(DAY,  1, GETDATE()), 0), 10),
            (DATEADD(DAY, DATEDIFF(DAY,  0, GETDATE()), 0), 11),
            (DATEADD(DAY, DATEDIFF(DAY, -1, GETDATE()), 0),  9)

    DECLARE @Result TABLE
            (
                dt SMALLDATETIME,
                ma SMALLMONEY,
                wma SMALLMONEY
            )


    First of all, I am going to show you an approach made by a cursor. This is actually one of the fastest way to accomplish this task!
    It is also very resource friendly and uses (n) combinations to get the results.

    -- Declare some variables needed by the CURSOR
    DECLARE @Date SMALLDATETIME,
            @Rate SMALLMONEY,
            @RateCurrent SMALLMONEY,
            @RateMinusOne SMALLMONEY,
            @RateMinusTwo SMALLMONEY

    DECLARE curUgly CURSOR FOR  SELECT      dt,
                                            Rate
                                FROM        @Sample
                                ORDER BY    dt

    OPEN    curUgly

    FETCH   NEXT
    FROM    curUgly
    INTO    @Date,
            @Rate

    WHILE @@FETCH_STATUS = 0
        BEGIN
            SELECT  @RateMinusTwo = @RateMinusOne,
                    @RateMinusOne = @RateCurrent,
                    @RateCurrent = @Rate

            IF @RateMinusTwo IS NOT NULL AND @RateMinusOne IS NOT NULL AND @RateCurrent IS NOT NULL
                INSERT  @Result
                        (
                            dt,
                            ma,
                            wma
                        )
                VALUES  (
                            @Date,
                            (@RateCurrent + @RateMinusOne + @RateMinusTwo) / 3,
                            0.7 * @RateCurrent + 0.2 * @RateMinusOne + 0.1 * @RateMinusTwo
                        )

            FETCH   NEXT
            FROM    curUgly
            INTO    @Date,
                    @Rate
        END

    CLOSE       curUgly
    DEALLOCATE  curUgly

    SELECT      dt AS [Date],
                ma AS NormalMovingAverage,
                wma AS WeightedMovingAverage
    FROM        @Result
    ORDER BY    dt


    But, as you can see it is not very configurable. What if you suddenly wants a moving average over 5 days? Well, you simply have to rewrite several parts such as declare, initial select statement, if clause and the insert part. Essentially the whole code.

    Now, what if we want to write a SET-based query to do the same thing? The most frequent used query I have seen before, is a self-join query (Cartesian product) like this.
    It is very slow since it tries all combinatations (n * n) before filtering out the rows to be used. However it is more configurable friendly.

    -- A common SET-based solution
    SELECT      t1.dt AS [Date],
                AVG(t2.Rate) AS NormalMovingAverage,
                SUM(CASE DATEDIFF(DAY, t2.dt, t1.dt)
                        WHEN 0 THEN 0.7 * t2.Rate
                        WHEN 1 THEN 0.2 * t2.Rate
                        WHEN 2 THEN 0.1 * t2.Rate
                    END
                ) AS WeightedMovingAverage
    FROM        @Sample AS t1
    INNER JOIN  @Sample AS t2 ON DATEDIFF(DAY, t2.dt, t1.dt) BETWEEN 0 AND 2
    GROUP BY    t1.dt
    HAVING      COUNT(t1.dt) = 3
    ORDER BY    t1.dt

    Even if the code looks SET-based, it is not in reality. I know Jeff Moden would call this RBAR (Row-By-Agonizing-Row). However, the code is more maintainable and altering from a 3 day to a 5 day moving average will make you have to change the code in 3 places only; the case statement, between value and having clause. Much better, but also absolutely worse performance compared to a cursor.

    Surely there must be a way to combine these two approaches? Yes, there is, and the combinations needed to get the result is only (3 * n) which is much less than the other set-based code but also somewhat more than the cursor approach.

    See this code.

    -- A better SET-based approach
    SELECT      DATEADD(DAY, d.Number, s.dt) AS [Date],
                AVG(s.Rate) AS NormalMovingAverage,
                SUM(d.Coefficient * s.Rate) AS WeightedMovingAverage
    FROM        @Sample AS s
    CROSS JOIN  (
                    VALUES  (0, 0.7),
                            (1, 0.2),
                            (2, 0.1)
                ) AS d(Number, Coefficient)
    GROUP BY    DATEADD(DAY, d.Number, s.dt)
    HAVING      COUNT(*) = 3
    ORDER BY    DATEADD(DAY, d.Number, s.dt)


    Changing this piece of code to a 5 day moving average only takes editing in 2 places; the derived table and the having clause. If you have the coefficients in an auxiliary table, there will be only one edit to be made! In the auxiliary table, because the having part can equal a subquery calculating the number of records in the auxiliary table.

    For comparison, I tested all three methods on 5,000 sample rows and got this result

    CURSOR             6,813 ms
    Common set-based  20,577 ms
    Better set-based     127 ms


    //Peso

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement