|
|
|
|
Thinking outside the box
-
Or do they just want to continue with their old habits?
The
reason for this blog post is that I the last week have tried to help people on
several forums. Most of them just want to know how to solve their current
problem and there is no harm in that. But when I recognize the same poster the
very next day with a similar problem I ask myself; Did I really help him or her
at all?
All I did was probably to help the poster keep his or her job. It sound
harsh, but is probably true. Why would the poster else continue in the old
habit? The most convincing post was about someone wanted to use SP_DBOPTIONS. He
had an ugly procedure which used dynamic sql and other things done wrong.
I wrote to him he should stop using SP_DBOPTION because that procedure have
been marked for deprecation and will not work on a SQL Server version after
2008R2, and that he should start using DATABASEPROPERTYEX() function instead.
His response was basically “Thanks, but no thanks”. Then some other MVP
jumped in and gave him a solution using SP_DBOPTIONS and the original poster
once again was a happy camper.
Another problem was posted by someone who wanted a unique sequence number
like “T000001” to “T999999”. I suggested him to use a normal IDENTITY column and
add a computed column and concatenate the “T” with the value from the IDENTITY
column. Even if other people several times proposed my suggestion as an answer,
the original poster (OP) unproposed my suggestion! Why?
The only reason I can think of, is that OP is not used to (or even heard of)
computed columns. Some other guy posted and insinuated that computed columns
don’t work on SQL Server 2000 and earlier. To that I just posted that computed
columns did in fact work already back in SQL Server 7.
Are people so stuck in their old habit and inept to change for whatever
reason that might be? Could it be they are not qualified, or lack enough
experience, for their current position? Do they lack basic education about
relational databases?
My question to you is, how do you really help people with these mindsets?
|
-
I came across an interesting post on Microsoft SQL Server forum this afternoon. It was a question about Relational algebra and the poster wanted to have an efficient query to solve his problem. The original title was "Select parent ids that have exact match on child values (no more, no less)".
The problem could be solved with relational division, but there is no such operator in SQL Server. Maybe there will be some day.
But for now there is no such operator, so we as developers have to find our own ways. First prepare and populate some sample data
-- Prepare sample data DECLARE @Sample TABLE ( ParentID INT NOT NULL, Keyword VARCHAR(25) NOT NULL, UNIQUE (ParentID, Keyword) )
-- Populate sample data INSERT @Sample ( ParentID, Keyword ) VALUES (1, 'one'), (1, 'two'), (1, 'three'), (1, 'four'), (2, 'one'), (2, 'two'), (2, 'three'), (3, 'one'), (3, 'two')
People had already been active and posted some solutions, of which this common query was present.
SELECT s.ParentID FROM @Sample AS s WHERE s.Keyword IN ('one', 'two', 'three') GROUP BY s.ParentID HAVING COUNT(DISTINCT s.Keyword) = 3 AND COUNT(DISTINCT s.Keyword) = (SELECT COUNT(*) FROM @Sample AS x WHERE x.ParentID = s.ParentID)
and this type of query
SELECT s.ParentID FROM @Sample AS s WHERE s.Keyword IN ('one', 'two', 'three') AND NOT EXISTS ( SELECT * FROM @Sample AS x WHERE x.ParentID = s.ParentID AND x.Keyword NOT IN ('one', 'two', 'three') ) GROUP BY s.ParentID HAVING COUNT(DISTINCT s.Keyword) = 3
The good thing is that both produce the same wanted result but the bad thing is the inefficient execution plans. Then one poster did his homework and read about Mr Celko and translated his algorithm to the current problem, and then the query looked like this
SELECT ParentID FROM ( SELECT ParentID, Keyword, COUNT(*) OVER (PARTITION BY ParentID) AS cnt FROM @Sample ) AS w WHERE Keyword IN ('one', 'two', 'three') GROUP BY ParentID HAVING MIN(cnt) = 3
Well, let's just say the execution plan is the worst of them all. The query does produce the correct result. With a little different angle, you get a better plan for Mr Celkos query.
SELECT s.ParentID FROM ( SELECT ParentID, COUNT(*) AS cnt FROM @Sample GROUP BY ParentID ) AS w INNER JOIN @Sample AS s ON s.ParentID = w.ParentID WHERE s.Keyword IN ('one', 'two', 'three') GROUP BY s.ParentID HAVING MIN(w.cnt) = 3
With these queries in mind, I thought about the problem and realized the problem did in fact have a much simpler solution. The query I came up with is the simplest of them all, and just does one pass of the source table. Yes, only one pass just as the first Celko query for relational division, but without the internal worktable. This is the query I came up with
-- Peso SELECT ParentID FROM @Sample GROUP BY ParentID HAVING MIN(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 1 AND SUM(CASE WHEN Keyword IN ('one', 'two', 'three') THEN 1 ELSE 0 END) = 3
How does the query work? The second aggregation filtering just makes sure all three keywords are present. But the first aggregation filter? What does it do? To simplify, I just write that it takes care of the modulo part of the relational division. There cannot be a "fractional" part of the relational division, because it means that particular ParentID has more keywords than wanted.
Simple as that.
//Peso
PS. These are the textual execution plans for the four types of queries and then mine.
|--Filter(WHERE:([Expr1003]=CASE WHEN [Expr1007] IS NULL THEN (0) ELSE [Expr1007] END)) |--Nested Loops(Left Outer Join, OUTER REFERENCES:([s].[ParentID])) |--Filter(WHERE:([Expr1003]=(3))) | |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0))) | |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1014]=Count(*))) | |--Index Scan(OBJECT:(@Sample AS [s]), WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD) |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0))) |--Stream Aggregate(DEFINE:([Expr1015]=Count(*))) |--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]) ORDERED FORWARD)
|--Filter(WHERE:([Expr1007]=(3))) |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1010],0))) |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1010]=Count(*))) |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([s].[ParentID])) |--Index Scan(OBJECT:(@Sample AS [s]), WHERE:(@Sample.[Keyword] as [s].[Keyword]='one' OR @Sample.[Keyword] as [s].[Keyword]='three' OR @Sample.[Keyword] as [s].[Keyword]='two') ORDERED FORWARD) |--Index Seek(OBJECT:(@Sample AS [x]), SEEK:([x].[ParentID]=@Sample.[ParentID] as [s].[ParentID]), WHERE:(@Sample.[Keyword] as [x].[Keyword]<>'one' AND @Sample.[Keyword] as [x].[Keyword]<>'three' AND @Sample.[Keyword] as [x].[Keyword]<>'two') ORDERED FORWARD)
|--Filter(WHERE:([Expr1005]=(3))) |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1005]=MIN([Expr1004]))) |--Filter(WHERE:([Keyword]='one' OR [Keyword]='three' OR [Keyword]='two')) |--Nested Loops(Inner Join) |--Table Spool | |--Segment | |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD) |--Nested Loops(Inner Join, WHERE:((1))) |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1008],0))) | |--Stream Aggregate(DEFINE:([Expr1008]=Count(*))) | |--Table Spool |--Table Spool
|--Filter(WHERE:([Expr1008]=(3))) |--Stream Aggregate(GROUP BY:([s].[ParentID]) DEFINE:([Expr1008]=MIN([Expr1004]))) |--Nested Loops(Inner Join, OUTER REFERENCES:([ParentID])) |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1012],0))) | |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1012]=Count(*))) | |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD) |--Index Seek(OBJECT:(@Sample AS [s]), SEEK:([s].[ParentID]=[ParentID] AND [s].[Keyword]='one' OR [s].[ParentID]=[ParentID] AND [s].[Keyword]='three' OR [s].[ParentID]=[ParentID] AND [s].[Keyword]='two') ORDERED FORWARD)
|--Filter(WHERE:([Expr1004]=(1) AND [Expr1005]=(3))) |--Stream Aggregate(GROUP BY:([ParentID]) DEFINE:([Expr1004]=MIN([Expr1006]), [Expr1005]=SUM([Expr1006]))) |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Keyword]='three' OR [Keyword]='two' OR [Keyword]='one' THEN (1) ELSE (0) END)) |--Index Scan(OBJECT:(@Sample), ORDERED FORWARD)
|
-
This time I will show you an algorithm to do the dreaded bin-packaging using recursion and XML. First, create some sample data like this
-- Prepare sample data
DECLARE @Sample TABLE
(
RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Expense SMALLMONEY NOT NULL
)
-- Populate sample data
INSERT @Sample
(
Expense
)
VALUES (12.51),
(45.63),
(66.35),
(92.66),
(65.46),
(54.01),
(32.23),
(27.16),
(78.92),
(14.58)
Next, we need to create a variable to hold the user's wanted total sum.
-- Prepare user supplied parameter
DECLARE @WantedSUM SMALLMONEY = 111.09
And we also need to create a temporary staging table to hold the valid combinations
-- Prepare temporary staging table
DECLARE @Temp TABLE
(
CombID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
TrackPath XML NOT NULL
)
Now we only have to do the calculations! Here I am using a special trick to get the unique combination, since the path of records 1>2>3 is the same as 1>3>2, 2>1>3, 2>3>1, 3>1>2 and 3>2>1. See explanation between combination and permutation at Wikipedia here. To keep track of which records I already have used in the total sum, I simply remove the record id (RowID) from the Hits list. And, to give the correct answer at the end, I build a XML string with visited RowID's building up the correct sum.
-- Calculate all possible permutations using recursion
;WITH ctePack(Total, Hits, TrackPath)
AS (
SELECT s.Expense AS Total,
(SELECT '#' + CAST(x.RowID AS VARCHAR(MAX)) FROM @Sample AS x WHERE x.RowID <> s.RowID ORDER BY x.RowID FOR XML PATH('')) + '#' AS Hits,
'<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
FROM @Sample AS s
WHERE s.Expense <= @WantedSum
UNION ALL
SELECT p.Total + s.Expense,
REPLACE(p.Hits, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Hits,
p.TrackPath + '<ID>' + CAST(s.RowID AS VARCHAR(MAX)) + '</ID>' AS TrackPath
FROM @Sample AS s
INNER JOIN ctePack AS p ON p.Hits LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%'
WHERE p.Total + s.Expense <= @WantedSum
)
INSERT @Temp
(
TrackPath
)
SELECT MIN(TrackPath)
FROM ctePack
WHERE Total = @WantedSum
GROUP BY Hits
When the iterations are over, and we have the wanted combniation(s), the task left is to report the records giving us the correct sum. We also need the records grouped so that we can see which group each expense belong to. In same cases, one and the same record may used in multiple groups.
-- Display the final resultset
SELECT t.CombID,
s.RowID,
s.Expense
FROM @Temp AS t
CROSS APPLY t.TrackPath.nodes('/ID') AS f(n)
INNER JOIN @Sample AS s ON s.RowID = f.n.value('.', 'INT')
ORDER BY t.CombID,
s.RowID
If you don't want to use XML, you can write the recursive cte like this (and drop the @temp table), to get all included records directly.
-- Calculate all possible permutations using recursion ;WITH ctePack(RowID, Expense, Total, Tracker) AS ( SELECT s.RowID, s.Expense, s.Expense AS Total, ( SELECT '#' + CAST(x.RowID AS VARCHAR(MAX)) FROM @Sample AS x WHERE x.RowID <> s.RowID ORDER BY x.RowID FOR XML PATH('') ) + '#' AS Tracker FROM @Sample AS s WHERE s.Expense <= @WantedSum
UNION ALL
SELECT s.RowID, s.Expense, p.Total + s.Expense, REPLACE(p.Tracker, '#' + CAST(s.RowID AS VARCHAR(MAX)) + '#', '#') AS Tracker FROM @Sample AS s INNER JOIN ctePack AS p ON p.Tracker LIKE '%#' + CAST(s.RowID AS VARCHAR(MAX)) + '#%' WHERE p.Total + s.Expense <= @WantedSum ) SELECT DISTINCT DENSE_RANK() OVER (ORDER BY Tracker) AS CombID, RowID, Expense FROM ctePack WHERE Total = @WantedSum
|
-
As some of you know, I was awarded Microsoft Most Valuable Professional (MVP) for SQL Server in July last year. Now it's time to see if I get my MVP renewed, or if I lose the award. I honestly believe it's a good thing MVP status only lasts for one year at a time. Knowledge is fresh. Things that worked in the past may not work any longer due to evolvement, and there are smarter ways to do things now, than before.
Being an MVP is a responsibility. It does mean you have more than average knowledge of SQL Server and how to work out the best from the product. It doesn't mean you know everything about the product. Noone does. Being an MVP means you have benefit of the community of your peers. I have learned more hardcore things only the last year, than I knew from the previous five years. And sometimes I get lucky to contribute to the MVP community myself.
That's how things work. Me being around knowledgeble people makes me learn, or understand, same things they do. The same thing apply to beginners on the forums. They most often look up to MVP's and learn one or two things from them.
The last year I haven't been around the forum as much as I want to. The reason is that I started (together with SQL server MVP Thomas Ivarsson) a PASS chapter in Sweden. It has taken a lot of time in the beginning and still does take some time to plan and invite interesting guest speakers for our meetings.
We are currently working to get a full day event in place in Sweden on the 10th of September. That's 3 months away but it feels there is not enough time! I guess every planner feels that way. However, this is in my "line of duty", to continue to help and educate the SQL Server community,
Hopefully I get my MVP status for another year. It means my peers have faith in me and my ability to support the community. I would like to thank a few people for the collaboration last year, which problably wouldn't have happened without my MVP status:
1) Itzik Ben-Gan for incredible insight and the email correspondence we shared the last year. 2) Joe Celko, for some fun competitions and coding help. I do love to see my [weighted] moving average in your upcoming SQL for smarties book. 3) Phil Factor, who have thought me a lot of driving and encouraging people to participate. And of course, some Speed Phreakery! 4) Tony Davis, who gave my the opportunity to be a technical reviewer.
There are a lot of other people whom I have shared ideas with. You know who you are.
I know a few people who probably will receive the MVP award this quarter. I wish you good luck and god speed!
//Peter
|
-
|
Yesterday I got the brilliant idea to format my laptop and reinstall Windows 7 from scratch. I've had a few problem last month, mostly due to upgrading the preinstalled Vista to Windows 7.
Said and done, I boldly finished the installation and thought everything was ok. I mean, my email is IMAP so all email are stored at my ISP, all my relevant documents are on another partition on the harddrive and so on...
But..! Now I have noticed I forgot to save my Favorites Links. So now I need your help. I can try to remember all my favorites (about 80 of them), or I can pledge to you to post interesting SQL-related blogs to me, so I can quickly check them and add them to my Favorites again.
Thank you
//Peter
|
-
PASS Scania is a new PASS chapter in Sweden which will promote the interest, networking and knowledge for professional SQL Server-users such as developers, DBA's and BI-specialist in Skåne. The Chapter also has a business alliance with an existing usergroup SQLUG.
All developers who use SQL Server platform in their work is welcome to participate in the seminars with start february 4th 2010. Both consultants and employees are welcome.
Membership and seminars are free and activities are mainly sponsored events.
First meeting is held in feburary where new members have the opportunity to present themself, we will present the agenda for 2010 and we will also demonstrate the new Microsoft PowerPivot platform.
Further seminars will include how to read and interpret execution plans, how to decide for an index strategy and query performance considerations.
Welcome to PASS Scania first event on the 4th of February 2010 at S:t Gertruds in Malmö at 18:00. More about agenda is found on www.pass-scania.se where you also register. Number of seats are limited and are restricted to sequence of registrations."
//Peter
|
-
With SQL Server 2008, we can easily use
DECLARE @bin VARBINARY(MAX) SET @bin = 0x5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
SELECT CONVERT(VARCHAR(MAX), @bin, 2)
But how can we do this in SQL Server 2005? You can't use a simple CAST or CONVERT, because you get the ASCII representation of the binary values.
So, here is how you can do the conversion with SQL Server 2005 by using XML.
-- Prepare value DECLARE @bin VARBINARY(MAX) SET @bin = 0x5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
-- Display the results SELECT @bin AS OriginalValue, CAST('' AS XML).value('xs:hexBinary(sql:variable("@bin"))', 'VARCHAR(MAX)') AS ConvertedString
|
-
The few last days, our hosting company have updated their VMware environment a number of times and thus have forced equal number of restarts for our database servers. The problem with this is that one of the databases, Yoda, needed 1.5-2.0 hours to start up due to "In Recovery" status.
I asked around what could be the cause of this and also read some excellent posts by Kimberley Tripp http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx http://www.sqlskills.com/BLOGS/KIMBERLY/post/8-Steps-to-better-Transaction-Log-throughput.aspx
I saw the undocumented DBCC LogInfo command and decided to give it a go. To my surprise, there were 33,636 records returned. As a general rule of thumb, you should have about 8-16 VLF per new chunk of log file, when increased. My log file was only 24 GB, so I expected a VLF number of about 25-50. So 33,636 is quite a difference!
So I executed this script about 20 times, and saw the number of VLF lower.
CHECKPOINT GO
DBCC SHRINKFILE(logYoda, TRUNCATEONLY) GO
ALTER DATABASE Yoda MODIFY FILE ( NAME = logYoda , SIZE = 8192 ) GO
DBCC LOGINFO GO
Finally, it seemed that the number of VLF's leveled out at 19. And now the "In Recovery" time has as fast as the other databases on the server. Next step is to deal with the number of VLF's for a database which is 1GB in size (log is 1 GB) and have 859 VLF's (expected is 8-16).
And finally, this is a script that present the number of VLF's per database.
EXEC master.dbo.sp_msforeachdb 'USE [?]; SELECT ''?'' AS [Database];DBCC loginfo'
|
-
In response to Adam's new series of T-SQL Tuesday, I wanted to write that there are faster ways to get the data in a set-based manner without resorting to SQLCLR. http://sqlblog.com/blogs/adam_machanic/archive/2009/12/08/t-sql-tuesday-001-exploring-fuzzy-interval-islands-using-sqlclr.aspx
Many companies today still think of SQLCLR's as security risks, so I wanted to rewrite Adam's set-based code to a more efficient algorithm. On my computer, Adam's code run in about 25 seconds. These two new algorithms below, runs in less than a second each. Which one to choose is how you interpret the definition Adam wrote. Also, this is not a blog post to make someone feel bad, it is a blog post to show there are other ways to write code, with better performance. You just have to think a little different. If someone is interested in how this algorithm works, please ask nice and I'll probably write a more in-depth blog post about the algorithm. Complaining in the comments doesn't work.
Take 1:
DECLARE @Interval INT = 7
;WITH cteSource AS ( SELECT ProductID, TransactionDate, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID FROM Production.TransactionHistory ), cteMatch AS ( SELECT s1.ProductID, s1.TransactionDate AS FromDate, s2.TransactionDate AS ToDate FROM cteSource AS s1 INNER JOIN cteSource AS s2 ON s2.ProductID = s1.ProductID WHERE s1.recID = s2.recID - 1 ), cteYak AS ( SELECT ProductID, FromDate, ToDate, y - ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS grp FROM ( SELECT ProductID, FromDate, ToDate, CASE WHEN DATEDIFF(DAY, FromDate, ToDate) <= @Interval THEN ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) ELSE NULL END AS y FROM cteMatch ) AS d WHERE y IS NOT NULL ) SELECT ProductID, MIN(FromDate) AS FromDate, MAX(ToDate) AS ToDate FROM cteYak WHERE DATEDIFF(DAY, FromDate, ToDate) <= @Interval GROUP BY ProductID, grp ORDER BY ProductID, MIN(FromDate)
This runs in less than a second.
Take2:
DECLARE @Interval INT = 7
;WITH cteSource AS ( SELECT ProductID, TransactionDate, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID FROM Production.TransactionHistory ), cteMatch AS ( SELECT s1.ProductID, s1.TransactionDate AS FromDate, s2.TransactionDate AS ToDate FROM cteSource AS s1 INNER JOIN cteSource AS s2 ON s2.ProductID = s1.ProductID WHERE s1.recID = s2.recID - 1 ), cteYak AS ( SELECT ProductID, FromDate, ToDate, CASE WHEN DATEDIFF(DAY, FromDate, ToDate) <= @Interval THEN 1 ELSE -1 END * ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS recID FROM cteMatch ), cteFinal AS ( SELECT ProductID, MIN(FromDate) AS StartDate, DATEADD(DAY, @Interval, MAX(ToDate)) AS EndDate, 1 + MAX(recID) AS recID FROM ( SELECT ProductID, FromDate, ToDate, recID - ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS grp, recID FROM cteYak WHERE recID > 0 ) AS d GROUP BY ProductID, grp UNION ALL SELECT ProductID, FromDate AS StartDate, DATEADD(DAY, @Interval, FromDate) AS EndDate, ABS(recID) AS recID FROM cteYak WHERE recID < 0 )
SELECT ProductID, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate FROM cteFinal GROUP BY ProductID, recID
This also runs in less than a second. And produces the exact same result as Adam's code.
//Peter
|
-
Yesterday, I decided to install Microsoft SQL Server 2008 R2 (November CTP) on my local machine. I already had SQL Server Express 2008 installed so the first step was to uninstall Express, since there seemed to be no upgrade path. Uninstalling went great. I then installed 2008 R2 and rebooted my machine. I attached my sample databases, and my test database. To my surprise, the Test database was now corrupted!
File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test.ldf" may be incorrect. The log cannot be rebuilt because there were open transactions/users when the database was shutdown, no checkpoint occurred to the database, or the database was read-only. This error could occur if the transaction log file was manually deleted or lost due to a hardware or environment failure.
I tried sp_attach_single_file_db to no vail. I tried all methods known by either Google or some of the MVP's I emailed for help. All of them said the only way was to restore the database. It was no big deal, because it is in fact a test database. Well, I had some time left so I decided to go Trial and Error! Since I couldn't attach the database file at all, the first step had to be to create a new database named Test with same size. I decided to create the logfile at the same location as the original file. The datafile I placed at the new location. CREATE DATABASE Test ON PRIMARY ( NAME = Test, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\Test.mdf', SIZE = 1024, MAXSIZE = 2048, FILEGROWTH = 15% ) LOG ON ( NAME = Test_log, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test_log.ldf', SIZE = 128MB, MAXSIZE = 2048MB, FILEGROWTH = 5MB ) ; GO
I thought it didn't matter what size the log file was, so I just used a small number, however I thought it was important to have the datafile at the same size as the corrupt one. I then stopped SQL Server service, replaced the empty datafile with the corrupt datafile and started SQL Server service again. The database Test now was visible in the databases, but of course it was not accesible. A number of different error messages was returned to be depending on how I tried to access the database.
Ad-hoc updates to system tables are not allowed with SQL Server 20008 and later (even if using sp_configure) so I issued
ALTER DATABASE Test SET EMERGENCY
This is the same as the previous "UPDATE SET = -32768" trick. I then decided to set the database in single user mode with
ALTER DATABASE Test SET SINGLE_USER
Now the work by checking the database was going to happen! Since it is a test database, I didn't care if there was dataloss, so I went for
DBCC CHECKDB (Test, REPAIR_ALLOW_DATA_LOSS)
It took some 2 minutes to complete, and there were errors and warnings. I rerun the CHECKDB command again, and this time where were only informational messages. After that, I set the database in multi user mode and put the database online again with
ALTER DATABASE Test SET MULTI_USER ALTER DATABASE Test SET ONLINE
And now I have access to my database again. It also seem there was no dataloss at all.
|
-
I have listened to the critique in the comments and removed the advanced version where you can calculate virtually any Fibonacci number. Contact me if you have the need for it again. I won't bother non-interested people with the algorithm here. Instead, I am concentrating on how this easy example of how to use recursive CTE for calculating the Fibonacci series is constructed. This is done as an easy exercise for students how to implement a recursive CTE, because the results can be confirmed visually. Understanding recursive queries is not natural for many people. For those of you not familiar with Fibonacci series and wonder what uses does it have, I can tell that Fibonacci series is used in a number of areas
- Run-time analysis of Euclid's algorithm to determine the greatest common divisor of two integers (the worst case input for this algorithm is a pair of consecutive Fibonacci numbers
- The Fibonacci numbers and principle is also used in the financial markets. It is used in trading algorithms, applications and strategies.
- Fibonacci numbers are used by some pseudorandom number generators.
- Fibonacci numbers are used in a polyphase version of the merge sort algorithm in which an unsorted list is divided into two lists whose lengths correspond to sequential Fibonacci numbers.
- The Fibonacci cube is an undirected graph with a Fibonacci number of nodes that has been proposed as a network topology for parallel computing.
- A one-dimensional optimization method, called the Fibonacci search technique, uses Fibonacci numbers.
- The Fibonacci number series is used for optional lossy compression in the IFF 8SVX audio file format used on Amiga computers
So how is the Fibonacci series built?
The matemathical formula is F(n) = F(n-1) + F(n-2), which in plain english is that any Fibonacci number (greater than 2) is the sum of the two previous numbers. 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 etc..
So how do we solve this using a recursive CTE (CTE is an acronym for Common Table Expression which was introduced with Microsof SQL Server 2005)? Can we solve it at all? A recursive query is nested only one level, right? And the Fibonacci series is "nested" two level?
Yes, we can solve it by using a complementary "sideways" translation, a sort of intermediate storage. First thing, all recursive CTE need an anchor part, a fixed part from which the recursion is expanded from. I do this by using the values {0, 1} which per definition are the two first values of the Fibonacci series.
How is then the recursive part done? I use the two values from the previous iteration and sum them together for the next Fibonacci value. I store that value in a column, shift the columns, and use the previous value in the same recursion level. Easy, huh?
;WITH Fibonacci(n, f, f1) AS ( -- This is the anchor part -- Initialize level to 1 and set the first two values as per definition SELECT CAST(1 AS BIGINT), CAST(0 AS BIGINT), CAST(1 AS BIGINT)
UNION ALL
-- This is the recursive part -- Calculate the next Fibonacci value using the previous two values -- Shift column (place) for the sum in order to accomodate the previous -- value too because next iteration need them both SELECT n + 1, f + f1, f FROM Fibonacci -- Stop at iteration 93 because we than have reached maximum limit -- for BIGINT in Microsoft SQL Server WHERE n < 93 ) -- Now the easy presentation part SELECT n, f AS Number FROM Fibonacci
|
-
This is nothing new to me (I come across this autumn 2007) and probably not to you either but I forgot about it and yesterday it came alive again. I often choose a table variable for performance reasons (if data is less than 1 page anyway) due to the benefits of no logging etc etc.
But yesterday I wrote a query for Phil Factor's "Subscription List" competition where I had choosen a table variable for the same reasons as before. It took a while to realize why (in this case) the solution with a temporary table was 30% faster than the solution with a table variable.
So I started to investigate the issue. SQL Profiler proofed me right over and over again for the 30% performance gain when using a temporary table over a table variable. It just didn't make any sense to me that the performance should differ that much. The INSERT was derived from a million record table, and the final resultset after grouping was just 120 records of 14 bytes each, well below the "one page rule". How come the solution with table variable was so much slower? Remember, in this fierce competition we calculated milliseconds for who was in the lead, so 30% was a giant leap.
I won't bore you with details, but finally it come clear to me that it seems table variables doesn't support parallelism. So I went to Google for searching this issue and the results supported my finding. Insert to a table variables makes parallelism impossible. Why? A table variable is just like any other table (except logging). I found this KB article http://support.microsoft.com/default.aspx/kb/305977/EN-US/ (the INSERT ... EXEC is suppoted with SQL Server 2008). And this http://msdn.microsoft.com/en-us/library/ms175010(SQL.90).aspx. However it doesn't say why a table doesn't support parallelism, just that doesn't support parallelism. Here is a page from the SQL Server Engine Storage Team http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/30/sql-server-table-variable-vs-local-temporary-table.aspx
Here is a testcase in a million record table.
-- First with a temporary table
CREATE TABLE #Sample
(
DateJoined SMALLINT NOT NULL,
DateLeft SMALLINT,
Registrations INT NOT NULL
)
INSERT #Sample
(
DateJoined,
DateLeft,
Registrations
)
SELECT DATEDIFF(MONTH, 0, DateJoined),
DATEDIFF(MONTH, 0, DateLeft),
COUNT(*)
FROM dbo.Registrations
GROUP BY DATEDIFF(MONTH, 0, DateJoined),
DATEDIFF(MONTH, 0, DateLeft)
DROP TABLE #Sample
-- Second with a table variable
DECLARE @Sample TABLE
(
DateJoined SMALLINT NOT NULL,
DateLeft SMALLINT,
Registrations INT NOT NULL
)
INSERT @Sample
(
DateJoined,
DateLeft,
Registrations
)
SELECT DATEDIFF(MONTH, 0, DateJoined),
DATEDIFF(MONTH, 0, DateLeft),
COUNT(*)
FROM dbo.Registrations
GROUP BY DATEDIFF(MONTH, 0, DateJoined),
DATEDIFF(MONTH, 0, DateLeft)
-- And then a conventional SELECT
SELECT DATEDIFF(MONTH, 0, DateJoined),
DATEDIFF(MONTH, 0, DateLeft),
COUNT(*)
FROM dbo.Registrations
GROUP BY DATEDIFF(MONTH, 0, DateJoined),
DATEDIFF(MONTH, 0, DateLeft)
|
-
Today our son was born. A healthy baby of 4.4 kg (9.8 lbs) and 53 cm (20.9 inches). Both mother and son are well, and spend the night at the safety on the hospital.
Water broke 2:45 am this morning and we were admitted to the ward at 4:21. Our son was born without complications at 5:06 am. Big sisters Isabelle (23 months) and Filippa (5.5 yr) are anxiously waiting for mother and new baby to get home.
//Peter
|
-
In the past I mostly have given the advice to break down all date intervals into the smallest part (often minutes) and then group by the minute.
-- Prepare sample data DECLARE @Data TABLE ( RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, CreateDate DATETIME, DeleteDate DATETIME )
-- Populate sample data INSERT @Data ( CreateDate, DeleteDate ) SELECT '2009-01-14 22:33', '2009-01-14 22:35' UNION ALL SELECT '2009-01-14 22:33', '2009-01-14 22:33' UNION ALL SELECT '2009-01-14 22:34', '2009-01-14 22:35' UNION ALL SELECT '2009-01-14 22:35', '2009-01-14 22:35' UNION ALL SELECT '2009-01-14 22:35', '2009-01-14 22:36' UNION ALL SELECT '2009-01-14 22:37', '2009-01-14 22:37' UNION ALL SELECT '2009-01-14 22:39', '2009-01-14 22:39' UNION ALL SELECT '2009-01-14 22:38', '2009-01-14 22:38' UNION ALL SELECT '2009-01-14 22:39', '2009-01-14 22:39' UNION ALL SELECT '2009-01-14 22:41', '2009-01-14 22:41' UNION ALL SELECT '2009-01-14 22:43', '2009-01-14 22:44' UNION ALL SELECT '2009-01-14 22:52', '2009-01-14 22:52' UNION ALL SELECT '2009-01-14 22:53', '2009-01-14 22:53' UNION ALL SELECT '2009-01-14 22:53', '2009-01-14 22:53' UNION ALL SELECT '2009-01-14 22:56', '2009-01-14 22:57' UNION ALL SELECT '2009-01-14 22:57', '2009-01-14 22:57' UNION ALL SELECT '2009-01-14 22:58', '2009-01-14 22:58' UNION ALL SELECT '2009-01-14 22:58', '2009-01-14 22:59' UNION ALL SELECT '2009-01-14 22:59', '2009-01-14 22:59'
DECLARE @From INT, @To INT
SELECT @From = MIN(DATEDIFF(MINUTE, 0, CreateDate)), @To = MAX(DATEDIFF(MINUTE, 0, DeleteDate)) FROM @Data
SELECT w.theTime, COUNT(*) FROM ( SELECT DATEADD(MINUTE, Number + @From, 0) AS theTime FROM master..spt_values WHERE Type = 'P' AND Number <= @To - @From ) AS w INNER JOIN @Data AS d ON d.CreateDate <= w.theTime AND d.DeleteDate >= w.TheTime GROUP BY w.theTime ORDER BY COUNT(*) DESC, w.theTime DESC
Well, sometimes the time interval is too large to hold all minute values, and what if you all of a sudden decides to break down the calculation into milliseconds?
The problem resembles about date range searching. Most people tend to use
SELECT * FROM Table1 WHERE Col1 BETWEEN '20090129 00:00:00.000' AND '20090129 23:59:59.997'
What you should use is open-ended search criteria as
SELECT * FROM Table1 WHERE Col1 >= '20090129' AND Col1 < '20090130'
The solution for this type of problem can then look something like this
-- Prepare sample data
DECLARE @Data TABLE
(
RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
CreateDate DATETIME,
DeleteDate DATETIME
)
-- Populate sample data
INSERT @Data
(
CreateDate,
DeleteDate
)
SELECT '2009-01-14 22:33:41.857', '2009-01-14 22:35:59.543' UNION ALL
SELECT '2009-01-14 22:33:42.857', '2009-01-14 22:33:59.543' UNION ALL
SELECT '2009-01-14 22:34:26.513', '2009-01-14 22:35:43.233' UNION ALL
SELECT '2009-01-14 22:35:14.920', '2009-01-14 22:35:31.530' UNION ALL
SELECT '2009-01-14 22:35:50.373', '2009-01-14 22:36:07.340' UNION ALL
SELECT '2009-01-14 22:37:26.793', '2009-01-14 22:37:44.857' UNION ALL
SELECT '2009-01-14 22:39:22.077', '2009-01-14 22:39:38.543' UNION ALL
SELECT '2009-01-14 22:38:03.873', '2009-01-14 22:38:20.827' UNION ALL
SELECT '2009-01-14 22:39:39.247', '2009-01-14 22:39:55.840' UNION ALL
SELECT '2009-01-14 22:41:25.857', '2009-01-14 22:41:42.467' UNION ALL
SELECT '2009-01-14 22:43:14.607', '2009-01-14 22:44:31.483' UNION ALL
SELECT '2009-01-14 22:52:10.233', '2009-01-14 22:52:26.827' UNION ALL
SELECT '2009-01-14 22:53:08.187', '2009-01-14 22:53:24.983' UNION ALL
SELECT '2009-01-14 22:53:36.483', '2009-01-14 22:53:53.060' UNION ALL
SELECT '2009-01-14 22:56:56.403', '2009-01-14 22:57:13.263' UNION ALL
SELECT '2009-01-14 22:57:28.247', '2009-01-14 22:57:44.780' UNION ALL
SELECT '2009-01-14 22:58:16.090', '2009-01-14 22:58:32.623' UNION ALL
SELECT '2009-01-14 22:58:52.137', '2009-01-14 22:59:08.670' UNION ALL
SELECT '2009-01-14 22:59:21.170', '2009-01-14 22:59:37.733'
-- Prepare staging table
DECLARE @Stage TABLE
(
RecID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
FromTime DATETIME NOT NULL,
ToTime DATETIME NOT NULL
)
-- Populate staging table
INSERT @Stage
(
FromTime,
ToTime
)
SELECT u.theTime,
u.theTime
FROM @Data AS d
UNPIVOT (
theTime
FOR theCol IN (d.CreateDate, d.DeleteDate)
) AS u
GROUP BY u.theTime
ORDER BY u.theTime
-- Update with closest range
UPDATE s
SET s.ToTime = w.FromTime
FROM @Stage AS s
INNER JOIN @Stage AS w ON w.RecID = s.RecID + 1
-- Delete last time
DELETE
FROM @Stage
WHERE RecID = SCOPE_IDENTITY()
-- Display the result
SELECT s.FromTime,
s.ToTime,
COUNT(*) AS Occurencies
FROM @Data AS d
INNER JOIN @Stage AS s ON s.FromTime < d.DeleteDate
AND s.ToTime > d.CreateDate
GROUP BY s.FromTime,
s.ToTime
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC,
s.FromTime DESC
Finally, there is also a twist with this approach. You do not only get for which exact minute there is the most concurrency, you get the whole range!
And you can also see the gaps, where no concurrency occurs at all.
-- Display the gaps
SELECT s.FromTime,
s.ToTime,
COUNT(d.CreateDate) AS Occurencies
FROM @Data AS d
RIGHT JOIN @Stage AS s ON s.FromTime < d.DeleteDate
AND s.ToTime > d.CreateDate
GROUP BY s.FromTime,
s.ToTime
HAVING COUNT(d.CreateDate) = 0
ORDER BY COUNT(d.CreateDate) DESC,
s.FromTime
//Peso
|
-
In my previous blog post, I wrote about how to calculate median value and weighted median value in a secure and fast approach.
In this blog post I am going to describe how you can calculate a fast moving average and also calculate a fast weighted moving average. This is the sample data we should work with during the whole exercise. I also display both the normal moving average and the weighted moving average at the same time.
DECLARE @Sample TABLE ( dt SMALLDATETIME, Rate SMALLMONEY )
INSERT @Sample VALUES (DATEADD(DAY, DATEDIFF(DAY, 10, GETDATE()), 0), 2), (DATEADD(DAY, DATEDIFF(DAY, 9, GETDATE()), 0), 2), (DATEADD(DAY, DATEDIFF(DAY, 8, GETDATE()), 0), 4), (DATEADD(DAY, DATEDIFF(DAY, 7, GETDATE()), 0), 4), (DATEADD(DAY, DATEDIFF(DAY, 6, GETDATE()), 0), 5), (DATEADD(DAY, DATEDIFF(DAY, 5, GETDATE()), 0), 6), (DATEADD(DAY, DATEDIFF(DAY, 4, GETDATE()), 0), 6), (DATEADD(DAY, DATEDIFF(DAY, 3, GETDATE()), 0), 8), (DATEADD(DAY, DATEDIFF(DAY, 2, GETDATE()), 0), 9), (DATEADD(DAY, DATEDIFF(DAY, 1, GETDATE()), 0), 10), (DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0), 11), (DATEADD(DAY, DATEDIFF(DAY, -1, GETDATE()), 0), 9)
DECLARE @Result TABLE ( dt SMALLDATETIME, ma SMALLMONEY, wma SMALLMONEY )
First of all, I am going to show you an approach made by a cursor. This is actually one of the fastest way to accomplish this task! It is also very resource friendly and uses (n) combinations to get the results.
-- Declare some variables needed by the CURSOR DECLARE @Date SMALLDATETIME, @Rate SMALLMONEY, @RateCurrent SMALLMONEY, @RateMinusOne SMALLMONEY, @RateMinusTwo SMALLMONEY
DECLARE curUgly CURSOR FOR SELECT dt, Rate FROM @Sample ORDER BY dt
OPEN curUgly
FETCH NEXT FROM curUgly INTO @Date, @Rate
WHILE @@FETCH_STATUS = 0 BEGIN SELECT @RateMinusTwo = @RateMinusOne, @RateMinusOne = @RateCurrent, @RateCurrent = @Rate
IF @RateMinusTwo IS NOT NULL AND @RateMinusOne IS NOT NULL AND @RateCurrent IS NOT NULL INSERT @Result ( dt, ma, wma ) VALUES ( @Date, (@RateCurrent + @RateMinusOne + @RateMinusTwo) / 3, 0.7 * @RateCurrent + 0.2 * @RateMinusOne + 0.1 * @RateMinusTwo )
FETCH NEXT FROM curUgly INTO @Date, @Rate END
CLOSE curUgly DEALLOCATE curUgly
SELECT dt AS [Date], ma AS NormalMovingAverage, wma AS WeightedMovingAverage FROM @Result ORDER BY dt
But, as you can see it is not very configurable. What if you suddenly wants a moving average over 5 days? Well, you simply have to rewrite several parts such as declare, initial select statement, if clause and the insert part. Essentially the whole code.
Now, what if we want to write a SET-based query to do the same thing? The most frequent used query I have seen before, is a self-join query (Cartesian product) like this. It is very slow since it tries all combinatations (n * n) before filtering out the rows to be used. However it is more configurable friendly.
-- A common SET-based solution SELECT t1.dt AS [Date], AVG(t2.Rate) AS NormalMovingAverage, SUM(CASE DATEDIFF(DAY, t2.dt, t1.dt) WHEN 0 THEN 0.7 * t2.Rate WHEN 1 THEN 0.2 * t2.Rate WHEN 2 THEN 0.1 * t2.Rate END ) AS WeightedMovingAverage FROM @Sample AS t1 INNER JOIN @Sample AS t2 ON DATEDIFF(DAY, t2.dt, t1.dt) BETWEEN 0 AND 2 GROUP BY t1.dt HAVING COUNT(t1.dt) = 3 ORDER BY t1.dt
Even if the code looks SET-based, it is not in reality. I know Jeff Moden would call this RBAR (Row-By-Agonizing-Row). However, the code is more maintainable and altering from a 3 day to a 5 day moving average will make you have to change the code in 3 places only; the case statement, between value and having clause. Much better, but also absolutely worse performance compared to a cursor.
Surely there must be a way to combine these two approaches? Yes, there is, and the combinations needed to get the result is only (3 * n) which is much less than the other set-based code but also somewhat more than the cursor approach.
See this code.
-- A better SET-based approach SELECT DATEADD(DAY, d.Number, s.dt) AS [Date], AVG(s.Rate) AS NormalMovingAverage, SUM(d.Coefficient * s.Rate) AS WeightedMovingAverage FROM @Sample AS s CROSS JOIN ( VALUES (0, 0.7), (1, 0.2), (2, 0.1) ) AS d(Number, Coefficient) GROUP BY DATEADD(DAY, d.Number, s.dt) HAVING COUNT(*) = 3 ORDER BY DATEADD(DAY, d.Number, s.dt)
Changing this piece of code to a 5 day moving average only takes editing in 2 places; the derived table and the having clause. If you have the coefficients in an auxiliary table, there will be only one edit to be made! In the auxiliary table, because the having part can equal a subquery calculating the number of records in the auxiliary table.
For comparison, I tested all three methods on 5,000 sample rows and got this result
CURSOR 6,813 ms Common set-based 20,577 ms Better set-based 127 ms
//Peso
|
|
|
|
|
|