|
|
|
|
Adam Machanic, Boston-based independent database consultant, writer, and speaker, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.
-
Hey bloggers! Wake up and get your favorite editor ready. T-SQL Tuesday is back again, this time with a great topic hosted by Mike at StraightPath Solutions. Stop reading this blog and go read his. And then post! As I mentioned to a few Indian MVPs at the MVP Summit a couple of weeks ago, there have been no T-SQL Tuesday contributions as of yet from the scores of bloggers in that country. So Indians, I'm calling you out! Share your I/O experiences with us.
|
-
According to T-SQL Tuesday rules as ratified by me in the first and second T-SQL Tuesday posts, the T-SQL Tuesday host is supposed to post a roundup within two days of the end of the event. So a reasonable person should expect a roundup to be posted by the second Thursday of the month. It gives me no pleasure to admit that I've been completely unreasonable and have totally dropped the ball. I'm twothreefour weeks late. (I actually started the post two weeks ago. That's bad. And now I'm forced to finish it because tomorrow is T-SQL Tuesday #003. That's very bad.) I have lots of excuses, but none of them are any good. So I hope that everyone who participated will accept my apology.
Now that the unpleasant part of this post is behind us, let's visit the positive side. The event generated 24 great posts! Apparently everyone out there can relate to being confused by something in SQL Server.
And since I'm three weeks late, this month's T-SQL Tuesday invitation has already been posted by the host, the much-better-organized-than-I-am Rob Farley (who sent me a link to the post over two weeks before he published it). Once you're done reading all of the great posts listed below, visit Rob's blog and get ready for this month's event.
The posts below are listed roughly in the order in which comments showed up on my blog on the day in question. I've also included a couple of stragglers who were late getting their posts in. If I'm three weeks late with the roundup I can surely forgive someone who posted their entry 24 hours behind schedule.
Rob Farley, who as I mentioned before is quite well organized, kicked things off with a cool and quite puzzling puzzle. What happens when you use a HAVING clause without a GROUP BY clause? Check out the post, then read the solution in his followup.
Kalen Delaney's post popped up next, puzzlingly on an almost identical topic to that of Rob's. Kalen gives us a bit of a history lesson about some strange situations with non-aggregated columns that used to be possible in older versions of SQL Server.
Sankar Reddy is so puzzled by SQL Server that he posted not just one, but two posts as part of the event. In his first entry, he talks about an issue with transactional replication (always a fun technology to manage). In his second post he discusses the mysterious Sort Warning event.
Jonathan Kehayias is yet another double-poster for the event. In his first post he describes a SQL Server 2008 to 2005 downgrade process and how he worked around the issue of doing the migration for 80 databases. In the second post he shows the twists and turns one can take when trying to do something simple like find out what SQL a request is executing.
Stephen Horne took the opposite end of the "puzzle" challenge, posting an actual puzzle. Can you write an unbeatable tic-tac-toe game in T-SQL? Give it a shot, it's fun!
Michael Swart comes up with a difficult puzzle of his own. Can you reverse-engineer an MD5 hash? You shouldn't, in theory, be able to do so (at least, easily), but here are some hints... And if you still can't figure it out, read the solution.
Another Michael--Michael Coles--jumped the gun a bit, posting his first T-SQL Tuesday post a day early. Its topic is the ever-puzzling task of creating properly-ordered build scripts. After I mentioned that his post wasn't technically eligible for the event, Michael pulled out a second great post, this time on the topic of SQL Server's often-misunderstood extended properties feature.
David Leibowitz brings a nice post on a topic that many SSRS developers get wrong: how to properly do multivalue parameters with stored procedures.
Another SSRS-related issue comes up in the puzzle posted by Brian Garrity. How do you tell whether FMTONLY is turned on?
Beginning Spatial provided a few spatial puzzles for those of you interested in that area.
Mladen Prajdic knows that there are many ways subqueries can be misunderstood... And shows us another. How many times is that subquery really evaluated?
Grant Fritchey may be scary, but is certainly not afraid of SQL Server; he says in his post that it was difficult to think of a puzzling situation. But after some brainstorming he came up with an interesting post about the query optimization process.
Alexander Kuznetsov continues his series of posts on defensive database programming and busts one of the most common patterns I've seen--the IF EXISTS(...) UPDATE ELSE INSERT pattern. A very nice read.
Steve Jones did a great job of describing one of the most common puzzling situations SQL programmers can encounter when working with less-than-perfectly-designed databases: implicit conversion issues.
Brad Schulz is up next with a T-SQL Tuesday first: a fairy tale! He uses the story to describe, step-by-step how to write a complex query and why the query optimizer throws exceptions when you try to do various things illogically. A very nice post.
Can anything strange happen when you reconfigure the "max server memory" setting? Ask Aaron Nelson... He was certainly puzzled by the outcome of this seemingly-innocent action.
Quick, which ASCII character is this: " "? Janusz Marchewa tells us, in his post, how he learned the hard way that not all white space characters are created equally--and how he figured out how to tell the difference.
Despite being late with this roundup post, I did managed to pull together my own entry for the event. If you work with non-typed XML on a regular basis, you might want to read my post to get a bit of a performance boost when using the Nodes method.
Andrew Hogg contributed a post in which he describes the trickery required to query the various DMVs in order to properly do dynamic partitioning. In the end, five views are needed to satisfy what at first seems like an easy query.
John Dunleavy's post is on the topic of join syntax. Remember that deprecated *= outer join syntax? Apparently such abominations never go away, and John tells us all about how to get rid of the offending code.
Jorge Serrada had a seriously confusing situation involving shared access rights to a program and the requirement that certain XML files get distributed to end-user's computers before they could run the thing and ... Read his post to find out how he resolved this mess.
Closing out the event is Jason Brimhall, who shares an interesting tale of woe: his UDF worked fine in SSMS and not so fine in SQL Agent. An all-too-familiar situation. Read his post to find out how he solved it.
...And that's it for thislast month's event! Head over to Rob's blog right away to read up on this month's edition--you have just over 24 hours from the time I hit Publish until your submissions are due. Get writing! And I'll see you next time.
|
-
If you've been looking for advanced training in the northeast, your wait is over. Paul and Kimberly will be gracing us with one of their famous "SQL Server Immersion" events the week of March 29. This course will cover storage engine internals, indexing and performance strategies, and of course in-depth sections on database maintenance from the guy who wrote DBCC. You can't go wrong there. A complete outline is available on the course web site. If you've never visited the Boston area, the spring is a great time to do it--we usually have great weather right in between the winter freeze and the overly-humid summer. The course location is Cambridge, MA, right near MIT: a really cool area with lots to see and do, easy access to downtown Boston via public transportation, and the weather should be perfect for some brisk evening walks around the Charles river after days of having your head pumped with SQL knowledge.
If you register soon you can take advantage of the "early bird" special, $600 off the $3100 course fee. Use registration code "EARLYBIRD" to take advantage.
Hope to see many of you there!
|
-
The query optimizer is a finicky thing, and sometimes it doesn't understand exactly what you're trying to do until you give it a bit more information. The situation I'm going to describe in this post is one such case. By providing the optimizer with ever-so-slightly more data, it's possible to make some XML processing over 300 times faster.
Here's the situation: The XML I was working with was stored in a table, but not typed as XML. Rather, it had been typed as VARCHAR(MAX). This presents an interesting conundrum for the optimizer: should the query be optimized as though operations are being done on a string, or on XML? If you would like to follow along with the examples below, here's some DDL and an INSERT statement to populate a test table using AdventureWorks data:
CREATE TABLE #myXML ( x VARCHAR(MAX) ) GO
INSERT #myXML ( x ) SELECT x FROM ( SELECT * FROM AdventureWorks.Production.Product FOR XML PATH ('Product') ) AS y (x) GO
Running this code will put one row into the temp table, with an XML document containing all of the product data from the AdventureWorks Production.Product table. And now, just for kicks, what if we want to pull all of the ProductIDs out of that document? Simple enough...
WITH theXML ( x ) AS ( SELECT CONVERT(XML, x) FROM #myXML ) SELECT node.value('ProductID[1]', 'INT') FROM theXML CROSS APPLY x.nodes('/Product') AS nodes(node)
This code isn't especially interesting or puzzling in and of itself. It converts the document(s) in the table to XML, runs them through the .nodes() function to produce one row per product node, and pulls out the first ProductID attribute found. If you've actually run the code on your end at this point, you know why I was puzzled: This code takes a full 20 seconds to run on my end. Which is a bit extreme, considering that there are only 504 products in the AdventureWorks database. In the real situation, the documents were several times bigger, and 20 seconds was over an hour in some cases. And that just wouldn't do.
And so much head-scratching ensued. And teeth gnashing. And cursing of the SQL Server programmability team. You know, a typical day at the office.
I pulled apart my code, put it back together again, and considered writing a CLR UDF to do the processing. But then I tried something on a whim:
DECLARE @x XML = ( SELECT TOP(1) x FROM #myXML )
SELECT node.value('ProductID[1]', 'INT') FROM @x.nodes('/Product') AS nodes(node)
And--shocker--this code returns all 504 ProductIDs seemingly instantly. (Actually, it takes around 28 milliseconds on my end.)
So was a cursor and document-by-document processing the answer? At first, it seemed so. But after further messing around I noticed something: adding TOP(1), so that only a single row was processed, wasn't taking too long. Could it be that the query processor was doing a lot more work than necessary, like converting the text to XML 504 times?
The TYPE directive can be used with FOR XML to make your query return the XML document typed as XML rather than typed as a string. Perhaps it would work here?
WITH theXML ( x ) AS ( SELECT CONVERT(XML, x) FROM #myXML FOR XML PATH(''), TYPE ) SELECT node.value('ProductID[1]', 'INT') FROM theXML CROSS APPLY x.nodes('/Product') AS nodes(node)
The empty path expression is needed because the TYPE directive only works in conjunction with a valid FOR XML mode. Using an empty path has zero net effect on the actual XML produced in this case. But using the TYPE directive has quite a huge effect: A reduction in query time to around 58 milliseconds on my end. The 30,000% speedup I promised you earlier.
So why does this work? A quick peek at the plans indicates that I was correct. Here's the first part of the plan for the first version of the query:
|--Compute Scalar(DEFINE:([Expr1023]=[Expr1022])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1004], XML Reader with XPath filter.[id])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1004])) | |--Compute Scalar(DEFINE:([Expr1004]=CONVERT(xml,[tempdb].[dbo].[#myXML].[x],0))) | | |--Table Scan(OBJECT:([tempdb].[dbo].[#myXML])) | |--Filter(WHERE:(STARTUP EXPR([Expr1004] IS NOT NULL))) | |--Table-valued function
The second version is almost exactly the same, but for one additional iterator:
|--Compute Scalar(DEFINE:([Expr1024]=[Expr1023])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005], XML Reader with XPath filter.[id])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005])) | |--UDX(([Expr1004])) | | |--Compute Scalar(DEFINE:([Expr1004]=CONVERT(xml,[tempdb].[dbo].[#myXML].[x],0))) | | |--Table Scan(OBJECT:([tempdb].[dbo].[#myXML])) | |--Filter(WHERE:(STARTUP EXPR([Expr1005] IS NOT NULL))) | |--Table-valued function
Notice the "UDX" iterator? That's an XML iterator that handles the conversion to typed XML. And in the first case, we don't get one, even though we've "technically" converted the string to XML at that point.
This story was puzzling, and somewhat arcane, but it has a moral that stretches far beyond this simple example: Only by giving the query optimizer much more information than any rational person might think necessary did we get a plan that does the right thing. And that is quite often the case when working with SQL Server. CHECK constraints, foreign keys, UNIQUE constraints, the DISTINCT keyword, GROUP BY, APPLY, and various other constructs are more than just ways to define your requirements or the output you're looking for. They can be used to provide information to the query optimizer to help it determine the best way to process your data. Information that can make your query return in a second instead of an hour. Information that will make your users happy and your project a success.
The secret to writing high performance T-SQL? Step out of your human mind. Un-puzzle. Be the optimizer. And until next month, thank you for reading this entry in T-SQL Tuesday #002!
|
-
The January 30 event is now starting to shape up quite nicely! Below you will find the current schedule of sessions for the day. We still have some room, so if you haven't registered already, now is the time to do so. Please make sure to register online if you're going to attend so that we can get a fairly accurate headcount for food, bags, etc.
Please note that registration opens at 7:45 a.m. We hope you'll show up on the early side to make the first session and avoid having to rush. We'll be providing coffee and some breakfast items to compel you to get out of bed and join us. Please let me know if you have any questions. Looking forward to seeing many SQLblog readers at the event!
|
-
Welcome to the second installment of T-SQL Tuesday, the monthly SQL Server blog party! Last month's kickoff event was a great success, and I'm really excited for this month's theme. I hope that we see even more bloggers, with even more great posts this time around--so please spread the word!
Theme Have you ever found yourself unable to figure out the intricacies of how some piece of code works? Ever been confused by the results you've gotten back from a query, only to find out that something totally unrelated was going on? Or have you ever been compelled to wile away your spare time working on a "challenge" posted by some blogger? For this month's T-SQL Tuesday, I'm asking participants to write a blog post on a "puzzling" topic, along the lines of some of the following ideas: - Describe a confusing situation you encountered, and explain how you debugged the problem and what the resolution was
- Show a piece of code that doesn't behave as most people might expect, and illustrate the reasoning behind the discrepancy
- Create a challenge for your readers to solve
As always, even given the event's name the posts are not limited to T-SQL! Any component of, or software product related to SQL Server, is fair game. MDX, SSIS, LINQ to SQL, Entity Data Model, NHibernate, and any other software product that deals with SQL Server data can be featured in your post. Be creative! Rules As before, any blogger can--and should--feel free to participate in this event. In order to make the event slightly more international, I'm changing the time range from PST-based to UTC-based. So the rules are:
- Your post must go live between 00:00:00 UTC and 23:59:00 UTC on Tuesday, January 12, 2010
- Your post must link back to this one, and it's recommended that you clearly identify the post as a T-SQL Tuesday post
- You are responsible for ensuring that a trackback or comment appears here so that I can find the posts
Follow the rules, and your post will be included in the roundup to
be posted on January 13 or 14. Don't follow the rules, and it won't
show up there. Simple as that! Twitter Follow the event on Twitter by watching for the #TSQL2sDay hash tag.
Additional Notes
If you put in a comment and your post doesn't follow the other rules, I will ignore your post. (Really sorry to have to add this, but a couple of people felt the need to do that last time and it violates the spirit of the event. Especially the person who tried to spoof a trackback, without any kind of link in the initiating post. If you're not going to link back to the invitation post, please don't bother participating, now or ever.) Per the rules outlined last time, I'm hosting again this month, and
starting next month the event will rotate around to other blogs. I'll control the schedule even once the event leaves here, so please let me know if you're interested in
hosting--either with a comment, a Twitter DM, or an e-mail--and I'll
add you to the list. To host, you must have participated in two previous T-SQL Tuesday events, and your blog must have had at least one post a month for the prior six months. I'm also collecting topic ideas, and would appreciate your sending them over as you think of them so that we can ensure that we have plenty of material to keep this event running for years to come. Enjoy, and as always let me know if you have comments, questions, concerns, etc.
|
-
Do either of these look familiar: "Suzanne is out of the office and will return December 12." "Hi! I'm on vacation, and may not have access to e-mail. If you need immediate assistance, please contact ..."
Or how about this one (seen in December): "I'm on vacation from July 3-7 ..."
Or the worst one I've seen recently, just three letters long: "OOF"
Auto-responders are a nuisance, an annoyance, and destroy the asynchronous nature of e-mail that makes it such a useful communication mechanism. Here's why: Too Much Information Disclosed
Does every single person who e-mails you really need to know where you are and what you're doing with your time? Get over yourself; no one needs to know whether you're on vacation or doing a seminar in China. It's information that doesn't need to be shared and, arguably, can be considered a violation of your own privacy and security. I don't know about all of my readers, but I get a lot of e-mails from a lot of people I don't know and have no reason to trust. What better time to rob your home than when you're off touring Machu Picchu? And what better way to communicate this fact to a complete stranger than via an auto response? Are You Really That Important? Is every single person who e-mails you sitting on the edge of their chair, repeatedly hitting the refresh button hoping that your reply will soon arrive? No, I didn't think so. And if you answered yes, get over yourself. If your reply is "delayed", no one is going to care. As a matter of fact... The Asynchronous Nature of E-Mail E-mail is, by its very nature, asynchronous. Which means that a reply will always be delayed. We send off an e-mail and expect a reply at some later date. If someone had something truly important to talk to you about, they would talk to you about it--on the phone. Or maybe start an instant message session. Did I mention that you should get over yourself? As an aside, sending an auto response telling me that reply "will be delayed" has an interesting side-effect. It subconsciously tells me that if I don't receive an auto response from you, that I should expect a reply right away. So next time you wait before replying to one of my e-mails, I might just wonder why you're snubbing me.
Technology Obviates the Need for Auto-Response I know that at least 75% of you have in your pocket, right now, an iPhone, Palm, Blackberry, or Droid. You can--and do--check e-mail no matter where you are. And you can--and will--pull it out while you're out of the office, on your vacation or anywhere else, and reply to any important e-mails that have come in. So forget about telling everyone that you're out of the office. Does it make you feel important? You need to ... get over yourself and quietly use that technology you've invested in, rather than wasting my time with pointless auto responses. "But, but, but..." I can already feel the wrath of my readers coming to bear as a result of this post. And there are a few (very few) valid reasons to use auto-responders: - Your employer has a rule that says you have to do it. Okay. Bad idea to argue with the person who pays your bills.
- You really are going away, for a long time, to a place where you really can't access your e-mail.
- You have an SLA or some other binding reason that a reply absolutely must go out after any e-mail is received.
Barring these reasons, you have no excuse. Sorry. A Plea for 2010
Please turn off your auto-responders. Especially for e-mail addresses that are subscribed to lists. There's almost nothing worse than an auto response until you get auto responses from someone you didn't even e-mail (you know who you are). If you must use auto-responders, please follow some basic rules: - Don't tell me where you're going, or that you'll be on vacation. Just say that you're out of the office. Honestly, nine times out of 10, I don't care.
- Don't use an auto-responder if you will be checking e-mail in any way at least once every 24 hours. Yes, this may be less than the 5-minute interval you usually work with, but no one knows that. A delay of a day or more is perfectly acceptable--and expected--for most e-mails.
- Set your auto-responder to automatically turn itself off the day you return. That way you can avoid sending a response when you actually are back in the office, which makes you look like a total moron.
- If you regularly provide "immediate assistance", set up an immediate assistance e-mail alias and have your customers e-mail you there. Then you can simply redirect those e-mails when you're not available. Problem solved, and no one needs to delete your auto responses.
- Don't tell me that response will be delayed. I've already figured that much out.
- Get over yourself. You're just not that important. Sorry.
Thanks to those who've read this far. I wish you all a fantastic 2010 (even those of you who use auto-responders).
|
-
Speaking is one of those activities where there is always something to tweak or improve. Whether you've just finished your first talk or your thousandth, after you're done I guarantee you'll look back and find at least a couple of things that you'll wish went better. Changes you can make for the next time will always be on your mind. As a speaker, part of your job is to realize that no talk is going to be perfect, but with work you can make your talks better and better, with fewer modifications required each time. Making things trend upward, at least for me, involves two components: lots of practice, and reading up on how others do a great job. As part of that second component--reading--I've purchased a number of books on the topic of speaking. While some of them contain useful tidbits, the majority of them--even the great ones--are painfully boring. And most of them have nothing very useful to share; some of the advice even borders on counter-productive. With all of that said, I'm happy to report that a new book published by O'Reilly and Associates bucks these trends and delivers what I've been looking for. "Confessions of a Public Speaker", by Scott Bercun, is highly entertaining and packed with great tips that apply equally well to both new and experienced speakers. Bercun does a great job of debunking a number of speaking myths (should you really consider what the audience would look like naked?) and teaches both what to do and what not to do by sharing many humorous anecdotes from his own and others' speaking experiences. At around 200 pages, the book is a breeze to get through, and if you're anything like me you will not want to put it down. Upon receiving the book in the mail I ripped open the package planning to skim the first few pages... and ended up reading half of the book in the first sitting. This has never happened to me with a speaking book before, and that alone is testament to the greatness that has been achieved here. Bercun focuses on conference and other public-venue speaking, so this book is perfect for just about anyone reading this blog post. Whether you're speaking at a user group, SQL Saturday, Code Camp, or major conference, the advice in the book will apply to your situation. Especially interesting is a chapter in which a number of disaster scenarios are outlined, with advice on both how to react in the best possible way and how to avoid them to begin with (note to self: don't go out drinking the night before doing a big talk).
The book is easy to digest, the content is top-notch, and the text is just as long as it needs to be to make its point--just like a great talk, and just like a good blog post, so I'll stop here. If you want to improve your public speaking, pick up a copy of this book right away.
|
-
Here on SQLblog, we take spam seriously. Actually, I should rephrase that: I take spam really, really, really seriously, some other people take spam somewhat seriously, and some don't really seem to care. It all balances out in the end. We've done a pretty good job keeping the site clean and mostly spam-free. We have a few different automated anti-spam tools, but it's not enough. I do manual sweeps though most of the site once or twice a week and pick up anything that managed to slip through the cracks. I'm constantly updating and refining our rules to try to keep the manual work as quick and painless as possible. I take it as a point of pride that we have one of the lowest spam rates I've ever seen on a large blog site with anonymous comments enabled.
And all of that work, and my pride, is still not enough. A couple of days ago I discovered, amidst a mild rush of panic, that spammers were getting their links on the site by creating new user accounts and putting the links in their profiles. Many of these accounts had obvious names that immediately signaled that something was up. A user account called "69 Sexy" on a site like SQLblog is a dead giveaway. Some of them were obvious but in stranger ways; we had over 80 accounts created by spammers advertising various fake Christmas tree sites. Christmas tree spammers? Seriously?
Being the kind of guy I am, and given the amount of work I put in fighting spam on the site, there was no way I was going to let these accounts live. So I hit the database and started hacking around trying to figure out how to stop them. "Hacking around" is not generally a good thing when you're trying to do database work, but the design that underlies Community Server is, shall we say, not the greatest I've ever seen. After several minutes of prodding and false starts, I found the table where the profile information is stored and thought about how to do the cleanup. I looked at a few of the spam accounts and found that they shared some common qualities: They all had "interesting" keywords in their profiles, but none of them had associated posts in blog comments or the forums--these, if they ever did exist, would have been taken care of by one of our other anti-spam measures. I did a few queries and discovered that there are five or six core tables in which most user data is stored, and about 30 satellite tables. My fear was that I might catch a real, valid account using one of the spam keywords (some valid word could have an embedded fragment of a spam keyword), and I decided to let referential integrity protect me. I wrote a script that started a transaction, did the deletes from the core tables in a TRY block, and if any exception occurred a rollback would fire in the CATCH block. I tested this logic using my own account (safely inside of a nested transaction to make absolutely certain I didn't delete myself!), and was pleased to see that the transaction was rolled back as expected.
And from that point on, it was kind of fun. I thought of all sorts of keywords and ran my script against them, removing scores of user accounts from the site. Every few keywords I would check the users list and find a few more. Goodbye, "forex traders". Goodbye, "payday loans". And see you later, "wire frame Christmas trees". (What the hell is a wire frame Christmas tree? I may never know.)
Of course, I wasn't using these keywords in their full, natural form. If I had a user with "forex trader" in his profile, I might delete all users where I found the string "forex", and maybe all users where the string "trade". The work was done using code along the lines of: Profile LIKE '%' + @string + '%'. I figured that it didn't matter if I caught some non-spam users, thanks to my diligent work with TRY-CATCH and the database's referential integrity constraints. The next day, I received an e-mail from one of the site's bloggers. He was locked out of his account, unable to access his blog. And so I jumped back into the database and made a startling discovery: The database, as it turns out, has a few referential integrity constraints. And it also lacks a few referential integrity constraints in places where a normal person might expect to see them. Oops. Turns out that I deleted the user accounts for about a quarter of our bloggers, and
about 50 users who had left perfectly valid comments in blog posts. Luckily, we're almost as serious about backups as we are about spam, and a quick database restore and an hour and a half of scripting
later all of the deleted user data was restored.
The morals of this story, for me: First and foremost, assumptions aren't worth much. This is something we all know, yet it's a trap I fall into over and over and over. Some lessons are more difficult to learn than others. Second, and more technical, don't trust a database design until you fully understand its ins and outs. Otherwise, surprises can and will pop up. This is another lesson I've learned countless times and seem to keep hitting. Something in me really wants to believe that most database designers do the right thing. But the reality is, they don't. And the final lesson of the day is not to get too caught up in pride or determination. I wanted those spammers out, and I wanted them out that moment. I wasn't willing to wait a bit longer or think a bit more carefully through my solution. And I paid the price for that rash decision making. Those spam links weren't actually hurting anything, they were merely causing me irritation. In retrospect, I could have handled the situation in a much more thoughtful manner.
In the end, no real harm was done. A valuable lesson or two was learned. And I got a topic for a blog post out of the deal. Not bad. And one day those spammers will feel my wrath... In a cool, level-headed way.
|
-
Wow! The response to the first T-SQL Tuesday was truly amazing. We ended up with 20 great posts, from all over the world. If you didn’t participate this time, fear not—we’ll be doing this every month from now on so there is plenty of time to jump in. And don’t forget our Twitter hashtag, #TSQL2sDay, which you can follow to keep up with T-SQL Tuesday even when it’s not the 2nd Tuesday of a month. Here are the posts for this month, in the order in which trackbacks/comments were received on the invitation blog post: Kicking things off was Dave Ballantyne, who shared “Age calculation with SQL Server”. If you’ve ever tried to calculate someone’s age and run into leap year issues, read this post. Rob Farley shared a post entitled “A date dimension table with computed columns”. Rob’s offering should help those of you who work with date dimensions in Analysis Services. Next up was Mike Walsh, with an intro post called “Dates and Time”. Mike covers many of the basics such as ISO date formats and whether you should cluster a date column. Definitely a good refresher! Bluedog67’s “Introduction to Effective Dating” has nothing to do with getting a girl’s phone number. Instead, he discusses the ins an outs of “effective dates”—in other words, the date that a piece of data became reality. A very interesting post! Alan Wood brings us “Return a range of dates between 2 days”, a post in which he describes a function that does exactly what the title says it will. I often use calendar tables for this, but there are definitely times that a function makes more sense. Good stuff. SummitCloud was not happy with some of the built-in SSRS features for Gantt charts, and posted an elaborate workaround in “Project Time Reporting Hack in SSRS”. A must-read if you need to create pretty reports based on time intervals. Brad Schulz is feeling rather paranoid… Various dates are giving him night sweats and other forms of panic. In “Friday the 13th 2009: Cluster of Terror” he uses a calendar table an T-SQL to evaluate one popular day of doom. Scary! Speaking of scary, next up is the Scary DBA himself, Grant Fritchey, with a cautionary tale—how not to do date and time queries. Read Grant’s post and then check your code base. Are you doing something that could cause problems? Jack Corbett is not sure whether he’s a wise man or a wise guy, but he does know how to retrieve date rages. He shares his insights in his example-filled T-SQL Tuesday post. Andy Leonard loves SSIS more than any man should love a computer program, and it shows. In “A Couple SSIS Date Expressions” he reveals the true nature of how to manipulate dates in our favorite Microsoft ETL program. Mladen Prajdic thought that there was already plenty of date/time information out there—why reinvent the wheel? He shares a large number of links to previously-written posts in “SQL Server Date and Time fun from all around”. Ever the internals geek, Kalen Delaney uses her knowledge to answer a number of pressing date and time-related questions in “My Datetime FAQ”. If you’ve ever wondered about the significance of the year 1753, read this post. Sean and Jen McCown are the Midnight DBA team, a couple that apparently prefers work to sleep. In “Remix! Optimized: Query by Hour, Day, Week, or Month” Jen shares the secrets of querying your data using a variety of granularities. Cool! Jamie Thompson may be the SSIS Junkie, but his post is all about “Unambiguous date formats” in T-SQL. If you work with applications that are used by people in different countries, read this. I’m the guy writing this blog post, and like the Midnight DBAs I apparently also prefer work to sleep—it’s currently almost 1:00 a.m. and I’m starting to wonder about just how fuzzy my brain is going to be in the morning. Luckily, I shared “Exploring ‘Fuzzy’ Interval Islands Using SQLCLR”, so it may be a non-issue. Allen Kinsel would like to thank the PASS program committee. And what does this have to do with date and time tricks? Not much, but he sneaks through the door by providing a link to a video about time zones. Okay, Allen, that’ll do. Jonathan Kehayias is interested in better tracking data across date and time ranges. He doesn’t ramble much (at all) in his post “Splitting Date/Time Ranges and Intersections”. Useful stuff, for sure. Stuart Ainsworth surprised me with his post. Why? Because it’s the only one that covered data modeling, one of the more interesting aspects of the date and time question. His very interesting post is titled “Date/Time Issues and Data Modeling”. I’m not quite certain that John Sterrett’s post is on topic for this month’s T-SQL Tuesday, but we’ll let it slide. His title is “Disk usage monitoring with Data Collector”, and he does at least mention time intervals. Good enough? Aaron Nelson kept things quite true to theme, posting a grab-bag of “Date, Time, tricks with the DateTime Data Type”. His post includes a number of snippets—perhaps you’ll find something that will help you solve a problem you’re currently facing? … And that’s it for this month’s T-SQL Tuesday! Next month will once again be hosted on this blog. Watch for the invitation post around January 4th. And please leave a comment here if you have any topic suggestions—I could definitely use some help coming up with a great list so that we can keep things rolling. Enjoy! Update: There are a few posts by people who didn’t properly link back to the original. I’ll give them a one-time pass, because it’s the holiday season and I’m a nice guy. Click through to read posts by Alexander Kuznetsov, Peter Larsson, and Rajib Bahar.
|
-
When working with time intervals, we often want to ask a couple of basic questions:
- Which time periods are not covered by our intervals? These are known as "gaps".
- What are the time ranges that we are fully covering? These are known as "islands".
If you're unfamiliar with "gaps" and "islands" I highly recommend reading some of Itzik Ben-Gan's recent work in SQL Server Magazine. He's had a great series going on the topic. But one thing that he hasn't found a good T-SQL solution for is a problem that I call "fuzzy islands."
When is an island fuzzy? When it doesn't necessarily have a fixed end time. For an example of this, consider a store, selling a number of products. Management might want to see a report showing when each product was selling. This is, effectively, an island question. The goal is to find all of the covered ranges during which sales occurred. But running such a report, you might find that way too much data is returned. A given product may have sold units on Monday, Tuesday, and Thursday, but for some reason no one bought one on Wednesday. Creating a new island every time there is a small gap will create a 300-page report where a 1-page dashboard might suffice--not a good user experience, nor a good way of representing the data. The solution? Introduce a bit of fuzziness--a rule that says, for instance, that a gap is only a gap if it's longer than 7 days.
Answering the fuzzy islands question is not a very difficult thing to do in T-SQL. The basic algorithm follows:
- Find all of the "start" dates or times. These are simply those dates or times for which a previous date or time in the fuzzy interval does not exist. So if a row is dated 2009-12-08 and our fuzzy granularity is 7 days, we know we have a start date if there is no other covered data from the end of November.
- For each start date identified, find the minimum date greater than the start date. This is done by looking ahead rather than behind, so if our date is 2009-12-08 and we have a granularity of 7 days, we'll look forward until December 15th.
- Optionally, add the fuzzy factor to the end date. This is something that I think is a good idea, as it introduces the concept of an "active" interval--a period over which, for example, a product is considered to have been selling. The interval shouldn't necessarily terminate the day that the last sale occurred. But of course this depends on the situation in question.
The following query produces a fuzzy islands report for each product in the AdventureWorks (or AdventureWorks2008) Production.TransactionHistory table. You can modify the @active_interval variable to tweak the fuzziness and change the output.
--Find all "active" product time ranges, meaning that the product --has sold within the previous 7 days DECLARE @active_interval INT = 7
SELECT DISTINCT t_s.ProductID, t_s.TransactionDate AS StartDate, DATEADD ( dd, @active_interval, ( SELECT MIN(t_e.TransactionDate) FROM Production.TransactionHistory AS t_e WHERE t_e.ProductID = t_s.ProductID AND t_e.TransactionDate >= t_s.TransactionDate AND NOT EXISTS ( SELECT * FROM Production.TransactionHistory AS t_ae WHERE t_ae.ProductID = t_s.ProductID AND t_ae.TransactionDate BETWEEN DATEADD(dd, 1, t_e.TransactionDate) AND DATEADD(dd, @active_interval, t_e.TransactionDate) ) ) ) AS EndDate FROM ( SELECT DISTINCT ProductID, TransactionDate FROM Production.TransactionHistory ) AS t_s WHERE NOT EXISTS ( SELECT * FROM Production.TransactionHistory AS t_ps WHERE t_ps.ProductID = t_s.ProductID AND t_ps.TransactionDate BETWEEN DATEADD(dd, -@active_interval, t_s.TransactionDate) AND DATEADD(dd, -1, t_s.TransactionDate) ) ORDER BY ProductID, StartDate GO
Running this query you'll find that it works... But the results are returned a bit more slowly than we might desire--15 to 16 seconds on my end. Looking at the query plan, the reason for this becomes quite obvious: Lots and lots of table scans. How can we eliminate all of the overhead?
SQLCLR to the rescue. The best way to solve this problem--at least until the SQL Server team adds proper OVER clause support (LAG and LEAD, specifically)--is to use a cursor algorithm. We could do this in a T-SQL cursor, but why bother? Cursor logic in SQLCLR is much, much faster.
To solve the problem, I implemented an enumerator, called active_products_enumerator. The enumerator is initialized using a SqlDataReader and an "active interval" -- the number of days we're allowing for fuzziness. The DataReader is expected to return rows ordered by ProductID and TransactionDate. The enumeration process uses the following algorithm:
- If the current ProductID is not the same as the previous ProductID, return an end date for the previous interval and start a new one
- If the current period date is greater than the previous period date plus the active interval, return an end date for the previous interval and start a new one
- Otherwise, continue
Following is the MoveNext method for the enumerator (the complete code is attached to this post so that you can run it on your end without my bombarding you with a gigantic code-filled post):
public bool MoveNext() { try { current_results = null;
while (r.Read()) { new_ProductID = r.GetSqlInt32(0); new_period_date = r.GetDateTime(1);
if (new_ProductID != ProductID) { if (ProductID != 0) { current_results = new results( ProductID, StartDate, period_plus_interval); }
ProductID = new_ProductID; StartDate = new_period_date; period_plus_interval = new_period_date.AddDays(activeInterval);
if (current_results != null) return (true); } else { if (period_plus_interval < new_period_date) { current_results = new results( ProductID, StartDate, period_plus_interval);
StartDate = new_period_date; }
period_plus_interval = new_period_date.AddDays(activeInterval);
if (current_results != null) return (true); } }
//return the last row of data if (ProductID != 0) { current_results = new results( ProductID, StartDate, period_plus_interval);
//set this to 0 so we don't return another row ProductID = 0; }
if (current_results != null) return (true); else { r.Dispose(); return (false); } } catch { r.Dispose(); throw; } }
Put into a table-valued function, as the attached code does, this algorithm will return the same data as the T-SQL query in under a third of a second on my end--around 45 times faster than the T-SQL version. Note that I've played some games with a loopback connection to get this whole thing to work. That's a topic for a future blog post, so stay tuned. In the meantime, please realize that you'll have to catalog the assembly with EXTERNAL_ACCESS permission to make this happen. This post was created for T-SQL Tuesday, the revolving SQL Server blog party, hosted this month by... me. Enjoy!
|
-
Last week I very quietly announced, via Twitter, that we've officially launched the site for New England Data Camp v2.0, the second go at a full-day event jointly hosted by both the New England SQL Server Users Group and the Southern New England SQL Server Users Group. The event will take place on January 30, 2010, at Microsoft's Waltham, Massachusetts office.
If you missed last year's v1.0, you can check out the fantastic session list on my blog post for the event. We had just over 200 attendees and feedback was quite positive. This year's event promises to be even bigger and better, as we've partnered with SQL Saturday, which should smooth some of our rough edges.
We are planning to have up to 25 sessions in four or five tracks, and our public call for speakers is open and ready for your submissions. Never presented before? No problem--shake off your stage fright and give it a try. I promise, you will have a good time. The Data Camp audience is laid back and happy to learn from you, whether or not you're a seasoned presenter. Just want to attend? That's fine, too. Our registration page is ready and waiting to reserve your spot at the event. We're also looking for a few sponsors to help defer the cost of the event. If your company would like to increase awareness amongst the best and brightest data professionals in New England, you can do no better than to sponsor this event. Information on sponsorship plans is available on our Sponsors page.
Looking forward to seeing many SQLblog readers there! Please let me know if you have questions, concerns, comments, etc.
|
-
Happy December, SQLblog readers! My gift to you, just in time for the holidays: The newest "official" release of your favorite SQL Server activity monitoring stored procedure. Click here to download Who is Active? v9.57 Since the last release--v8.82, from August of this year--I've made a number of modifications to the script, resulting in six interim "beta" releases. The uniting theme of all of these changes is more, better quality data, faster. More data. - Two new core options were added at the behest of users: @show_sleeping_spids and @show_system_spids. These options cause the procedure to return information that wasn't previously available via Who is Active, about--you guessed it--sleeping and system sessions, respectively.
- Another major change was adding a feature so that the script now shows blocking sessions whether or not they're included in the default filter criteria. This is not something you turn on or off--it just happens--and will ensure that if you're debugging a blocking scenario you'll automatically have all the information you need.
- And for you query plan geeks out there, the full wait stats collection mode (see the "Faster data" section below) now returns node identifier information with CXPACKET waits--helping you to track progress of tasks as a plan is executed.
- Finally, a small modification. The online help system (@help=1) now returns information about both the available input parameters as well as all of the available output columns.
Better data. - A few minor bugs were fixed, mostly having to do with the evils of MARS and the fact that the DMVs don't properly deal with MARS sessions in many cases.
- Workarounds were also added for inconsistencies in how the DMVs report SQL handles, even without MARS.
- Two features were added to help you get only the data you need when you need it, and not the data you don't:
- Dynamic sort ordering, via a parameter called @sort_order, lets you pass in a list of columns and column directions by which to sort the output.
- And "not" filters, implemented using parameters @not_filter and @not_filter_type, work exactly the opposite of regular filters. These are useful in those cases where you have a bunch of sessions that aren't of interest, and you don't want them cluttering your output.
- Finally, I've changed the default output column order to something I think is a bit more useful. Don't like my selections? No problem--override me using the @output_column_list parameter.
Faster data. - This is the area in which I made the most modifications. A monitoring tool borders on useless when it takes a minute or more to return key metrics when your server is on fire, and alas, previous versions of Who is Active were doing just that for some users.
- I've made major changes to the core queries in this version of Who is Active, bringing down query times from minutes to a few seconds in many cases.
- In conjunction with these changes I added a new lightweight wait stats collection mode, which is the new default. This mode collects only the top non-CXPACKET wait, giving preference to blocked waits, so that you can see the worst problems without having to sort through a lot of output that may or may not apply.
- If you miss the complete stats collection mode, it's still there--simply set @get_task_info=2 when calling the procedure.
I'm quite happy with this release, and I hope that it will help people quickly solve a number of tricky SQL Server problems. As always, your feedback is very much appreciated! Leave me a comment here, e-mail me (my address is in the script), or track me down at a conference. Most of the features in the past few versions are a direct result of requests I've gotten from users. A huge thank you to those who tested and gave me feedback since the last version! Aaron Bertrand, Rajiv Jain, Michelle Ufford, Uri Dimant, and everyone else, I really am thankful for your efforts. To these people and all of my readers, I wish you a happy December and a prosperous 2010. Until next time, enjoy! Click here to download Who is Active? v9.57 Please ignore the text below. Putting in it for search purposes. sp_whoisactive whoisactive sp_who sp_who2 sp_who3 sp_who4 sp_who5
|
-
T-SQL Tuesday is the SQL Server blogosphere's first recurring, revolving blog party.
The idea is simple: Each month a blog will host the party, and about a
week before the second Tuesday of the month a theme will be posted. Any
blogger that wishes to participate is invited to write a post on the
chosen topic. The event is called "T-SQL Tuesday", but any post that is related to both SQL Server and the theme is fair game.
So feel free to post about SSIS, SSRS, Java integration, or whatever
other technologies you're working with in conjunction with SQL Server.
Even if your post includes no T-SQL we still want to see it. The
posts must go live on the second Tuesday of the month, by (or before)
23:59 PST. Each post must link back to the host blog's post, and the
blogger should make sure that a trackback has been generated and
successfully posted to the host blog; if not, a comment should be
posted on the host blog with a link, so that the host blogger can find
the post. The host blogger will then take all of the posts and
within a day or two will compile a "roundup"--a brief summary with links--so
that readers can find all of the relevant posts. Why do this?
Several reasons: It's fun, it's going to generate a lot of interesting
content, and it's going to bring lots of bloggers into the fold and
improve our overall community by creating a lot of links between
blogs. Any blogger is invited to participate, whether the blog has been
live for six days or six years. If you can write, you should join in
the fun. Rules of engagement for hosts: Although anyone
can join in the party, hosting has a slightly higher bar: Your blog
must have been active in the last six months ("active" is defined as at
least one post per month), and you must have participated in T-SQL
Tuesday events on at least two prior occasions. Since that's impossible
until we've had two events, I'll host the first two myself. After that
it will move around to as many blogs as we can get involved--and I hope
that a number of you out there decide to go for it. The host blogger is not required to write a participating post to be included in
the roundup, but is free to do so if he or she chooses. Want to host? Contact me through my blog--I'll
keep a waiting list and control it centrally so that it doesn't devolve
into anarchy too quickly. So that's that. If you bothered to read the title of this post you're aware that the topic for this month is Date/Time Tricks.
Write a blog post that talks about dates and times--this can be based
around T-SQL programming, data modeling, ETL, reporting, or whatever
else you're using dates and times for (and who isn't?). Make sure that
your post goes live on Tuesday, December 8, between 00:00 and 23:59 PST. Make it clear in your post that it's for T-SQL Tuesday, and make sure it links back here. Finally, make absolutely certain that a comment shows up here so that I can find the post. Questions? Comments? Post here. Enjoy! I'm looking forward to the start of a great blog tradition in the SQL Server community.
|
-
Thanks to everyone who attended my sessions this week at SQL Server Magazine Connections in Las Vegas. Attached to this post you will find the demos for the following sessions: SDV301: Best Practices for Exception Handling and Defensive Programming in SQL Server 2005 and 2008
As developers, we sometimes become lax about dealing with error and
exception conditions by the time our code gets down to the data level.
Exceptions can feel like something that only application code needs to
worry about, until you realize that in SQL Server they can have a
tremendous effect on your transactions and your data integrity.
Learning to properly handle them is, therefore, of paramount importance
to those of us who write data-centric applications. SQL Server 2005
greatly improved exception handling options by adding support for the
structured TRY/CATCH syntax, but there is a lot more to the story than
just that feature. In this session, we'll delve into the ins and outs
of exceptions in both SQL Server 2005 and SQL Server 2008, starting
with the database engine itself: types of exceptions, when and why
they're thrown, and how the server treats them. Next, learn how to
configure and throw your own custom exceptions, as well as how to
leverage the SQL Server exceptions infrastructure with a variety of
exception handling and defensive programming techniques both with and
without the TRY/CATCH syntax. Most importantly, we review the effect of
exceptions on transactions, and how to take programmatic control over
the outcome of your transactions in the face of an exception. SDV303: T-SQL Power! Learning to Harness the Under-Used OVER Clause
First introduced in SQL Server 2005, the OVER clause is an ANSI SQL
enhancement that gives you tremendous control when dealing with
aggregations. By using the OVER clause, query writers can
simultaneously aggregate columns based on multiple groups. The feature
also enables the query engine to provide windowing mechanisms for
ranking and row numbering. Leveraging these powerful language
enhancements allows you to solve a surprisingly large number of
difficult query problems—including custom paging schemes, data
de-duplication, "top-N" problems, and complex statistical calculations.
Even better, this feature can be creatively applied to help with
performance optimization of certain tough queries. In this session, you
will learn all of these techniques and see why, after applying the OVER
clause in dozens of projects since the release of SQL Server 2005, I
consider it to be one of the most powerful T-SQL features available. SDV210: What Happened? Auditing, Tracking, and Change Monitoring Technologies in SQL Server 2008
Regulatory bodies...end-users...your boss. They all want answers. Many
questions are easy enough to deal with: "Did someone drop my view?"
Others are a bit trickier: "What was the previous value of this row?"
And some are seemingly impossible: "Who selected the data from this
table over the past week?" For many DBAs, the answer to some or all of
these questions is often "Umm..." But don't blame yourself; getting
this information in SQL Server has never been especially easy—until
now. SQL Server 2008 ships with several new technologies designed to
help you track and report on exactly what happened, who did it, and
when. In this session, you will learn about SQL Server 2008's Change
Tracking, Change Data Capture, and SQL Server Audit features, each of
which provides a distinct set of capabilities and has specific
strengths and weaknesses. Looking at each of these technologies in
turn, you will see how they work and where you might want to leverage
them in your SQL Server infrastructure. If you're used to saying
"Umm..." get ready to say "I'll be right back with the answer." Enjoy!
|
|
|
|
|
|