THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

  • Why We Write #3 - An Interview With Rob Farley

    In the third entry in this series, we take a turn south, not in quality, but in the geography of our next entrant. Rather our interview target is Rob Farley, who is from (well lives in) Australia. 

    Rob Farley is a SQL Server MVP, and is quite a busy fellow. He is the owner of a consulting company named LobsterPot Solutions, located in Adelaide, and is a current member of the PASS Board of Directors. His blog is located at SQL Blog, where I also blog, and he tweets under the extremely unobvious handle of @rob_farley. He speaks regularly at SQL PASS conference, and sang during the keynote with the one and only Buck Woody (http://blog.datainspirations.com/2011/10/14/pass-summit-2011-day-3-keynote/).

    Rob has been a friend for quite a few years now, starting (in my mind) when I introduced myself to him at a conference thinking he was Arnie Rowland (yet another wonderful member of the SQL community, whom you might mistake for Rob from a hard working standpoint, but Arnie doesn't wear glasses regularly :). As I remember the story, neither were offended, much like when people mistake me for Orson Welles, I am honored because he was such a great writer. I will note too that Rob remembered it differently, but my version makes me sound far more intelligent. Rob is a tough one too, as I once worked with him on a Microsoft Learning contract in Seattle where he had recently had his appendix out (remember where he is from… and I get kind of sore from a 4 hour flight!)

    So, now that we have gotten past the silliest parts of the interview from my contributions, on to the interview questions.

    ------------------------------------------

    1. Think back to the moment you hit the first key, starting to write a blog, an article, a book, or whatever. What made you do it? Or perhaps, what were you expecting to achieve that was better than your previous use of free time. Have you gotten the benefit you were shooting for back then?

    It was April 2005. I had been getting more and more involved in the community, attending the occasional user group, both .Net and SQL Server, and there was an all-afternoon event about SQL Server 2005 that I went along to. I’d already been thinking about getting into blogging, and a conversation with one of the evangelists from Microsoft who was at this event meant that I wrote my first blog post the next day. At the time, I just figured that it might be helpful for someone, but didn’t know who that might be. At the time, I was feeling like I needed to be stretched, and blogging gave me the chance to write about the things that I knew, and to go a little further with things than I had before. When you write things down, you find yourself wanting to make sure that it’s right – blogging gave me that then, and still does. Unfortunately, I cringe at old blog posts, as I think we all do [ed; I know I certainly do!], but I still enjoy the experience of getting content into a blog-worthy condition and publishing it for other people to read.

    2. We all have influencers that have advanced our careers as writers. It may be a teacher who told you that you had great potential? Another writer who impressed you that you wanted to be like? Or perhaps on the other end of the spectrum it was a teacher who told you that you were too stupid to write well enough to spell your own name, much less have people one day impressed with your writing? Who were your influences that stand out as essential parts of your journey to the level of writer you have become?

    In school I had teachers that liked my writing, and teachers that thought it was awful. From that, I managed to discover that my writing had a particular style, a ‘voice’ (accent?) that could be heard. Today I value that in my writing, and try not to let it go. The times I’ve written book chapters I’ve worried a lot about losing that style, as editors often try to avoid having that kind of thing come through. It’s probably like how I’ve a tendency to use contractions. If I couldn’t’ve ever used them, I think I’d’ve struggled to write much, as people wouldn’t hear me in what I wrote. Perhaps JD Salinger had an effect on me, as Holden Caulfield’s voice came through so much in that book.

    3. Can you describe the process you go through to write (including any tools you find indispensable), from inception of an idea until it gets put out for consumption?

    This depends on what kind of thing I’m writing. :)

    I’m very big on just opening Live Writer and pouring text onto the screen. Of course I’ll need to spend time in SSMS, getting the queries right to demonstrate the technical aspects, but I like to just get the text flowing. I’m very self-critical, which means that I don’t try to think how to phrase every sentence, or use just the right simile, but rather, I try to bring the reader into what I’m writing and explain things to them. I’m currently trying to get online training sorted which I call “Train-the-Explainer”, because I want to be able to teach people things in a way that helps them really understand the concepts of what’s going on, and I try to have that same idea come out in my writing. I’m likely to end up using phrases like “You know how…, well it’s like that,” in my writing, but when it comes down to it, I want people to read it as if I’m sitting next to them, explaining things to them.

    What I find really hard is writing songs or jokes. I realise that I tell jokes, even during presentations, and I wrote a comedy set for the PASS Summit in 2010 and a song for 2011, but as much as I’d like to do much more of that, I really struggle. I really want to write both, but find myself crossing things out, or finding that things really aren’t as funny (or poignant or whatever) as I’d like. A co-writer would be good – someone I could bounce things off so that they can tell me when an idea is worth pursuing, and when my ranting should be converted into an actual joke.

    4. Assume a time machine has been created, and you are scheduled to speak to a group of potential writers, in which you and I are in attendance. Without concern for the temporal physics that might cause the universe to implode making the answer moot, what would you tell "past us", and do you think that your advice would change where you and I are in our careers now? (like would you tell yourself to get excited for the day you will be sitting here for a rather long period of time answering interview questions and not getting paid for it, instead of feeling the warm sun on your forehead?)

    Don’t get me started on the time travel thing. I have conversations with my kids about that kind of thing, like ideas around how paradoxes could get resolved… but back to the question!

    If I could choose when to go back to, I’d go back much further… to a time when I thought I wasn’t any good at writing (ok, that’s typically still the case – did I mention I’m self-deprecating?), and was choosing to pursue a computer-focused degree. I’d tell myself to really explore the things that I enjoyed, including writing, and to just get started. I could put so many things in that bracket – comedy, writing and music are some that I’ve already mentioned – but I keep finding myself way more interested in people than in technology. I enjoy teaching (adults, not kids), I enjoy ministry, I enjoy community, but my career has largely been focused on technology. I’d tell those potential writers to start doing those things which define them. Solving puzzles can be fun, but unless those puzzles are allowing you to be creative, then they may not be completely satisfying. Of course, I doubt it would make a difference. Someone who’s good at maths will see the creativity in that and still end up in IT.

    5.Finally, beyond the "how" questions, now the big on. There are no doubt tremendous pulls on your time. Why do you do write?

    This comes down to that last question. I write because it’s something which lets me be slightly closer to what I want to spend my time on. I’d like to be completely financially independent, and be able to spend my days helping other people with things. My career as a consultant lets me do some of that, but not in the way that I really want. Those people who ask me for help with things probably know that I quite willingly invest myself into their particular problem, and I honestly do it completely for them, because I enjoy it. Writing lets me do that in a way that means they don’t have to ask – for those people that go looking for something and stumble across it.

    Finally, a bonus question I provide to let the person stretch the topic and talk about anything they want to:

    1. Is there any project you would like to tell people about that we haven't yet mentioned?

    I should be better at marketing, but I’m really bad at it. I should write some stuff about how LobsterPot is a great company that you should all use to improve your data story. We can help you write better T-SQL, tune your system, get your data into a data warehouse, even present it in the amazing PivotViewer platform that we ported to HTML5 so that it runs on iPads. I should write about the Train-the-Explainer thing that I want to do, where I’ll charge people a small amount to attend an online classroom (limited sizes) to have me explain SQL stuff to them in a way that hopefully means that they can not only implement the ideas, but can actually explain it to other people. I should write about how I’m available to teach Advanced T-SQL courses, and will happily come to just about anywhere in the world to do so (although it’ll be at your expense, and it’ll have to fit into my course schedule). But I’m not good at self-promotion, so if your readers want to ask me about these things, they should probably just drop me a line and start a conversation.

    …I’m always happy to talk. [ed. @rob_farley is his twitter if all else fails]

    -------------------------------------------------------

    I definitely want to thank Rob Farley for taking the time to answer my interview today. I got a bit more insight into how yet a third person thinks about the process and value of the writing process. His why answer reminds me of some of the reason I got started answering forum posts. I don’t always love helping individuals directly because usually when you have gone out and asked a question, you are lost and just need that straightforward how do I get out of this jam, answers. Kind of like when you go to the gas (petrol?) station and ask for directions. If they start telling you how you should have planned ahead, while it is good advice, it can tick you off. The only person who is apt to learn a lesson there is a bystander who hears the answer. Since they aren’t lost, hearing how to ever avoid being lost may be useful. When writing. I always try to help with the immediate need first: “the bakery is a block that way”, and then “the app you have on that phone I see will tell you how to get there if you are ever lost again”. They might not care, but the next reader might.

    Still not the answer I would give to the time machine question (other than the paradox stuff!), but I love Rob’s answer.

    The next entry will be Doug Lane, who works in BI. He will be speaking at the PASS BA Conference this week (4/10-12, 2013; so don’t go there if this is 2020 when you are reading this and blame me), so feel free to suggest answers for him if you see him there!

  • Why We Write #2 - An Interview With Mark Vaillancourt

    My second guest is Mark Vaillancourt (whose last name makes me very happy for the copy and paste feature), who is an Information Management consultant working for Digineer, and is a Regional Mentor for PASS in Canada. Mark is also a speaker at SQL Saturday events, as well as the SQL PASS Business Analytics conference in 2013.

    Mark has been blogging regularly since early 2009 on his website (http://markvsql.com/), and interestingly, has degrees in English and Theater, two degrees that almost always lead one into a career in technology.  His twitter account, @markvsql, is also quite active with over 6400 tweets to date. To be honest, I don't enter into this interview knowing nearly as much about Mark Vaillancourt as I did about Thomas LaRock in entry #1, as Mark is more involved in Business Intelligence while I spend most of my conference and blogging time in the OLTP/Relational Engine side of things. I am looking forward to learning more about his writing process and his career in his answers to the following five questions.

    Mark is currently working on his first white paper to be published via Digineer’s website. He wouldn’t reveal the topic, but describes it as a topic that he feels is under-served and will help a lot of people. I hope he will give me the link to include in this interview when he gets it finished.

    -------------------------------------------

    1.  Every superhero has an origin story, and in many cases it wasn't because they specifically were planning to go into the field of superhero-ness. I mean, clearly Peter Parker didn't really want to get bitten by a radioactive spider. So what is your story that led you to spend part of your free time writing about SQL?

    In my early days at Digineer, Lara Rubbelke, who actually hired me during her tenure there, encouraged me to blog about my experiences learning SQL Server. Since I was hired there having never worked with SQL Server before, there were sure to be many learning opportunities. Whenever we would talk about the obstacles I was encountering and what I was doing to overcome them, she would always end the conversation with, “Blog about it.” I finally took her advice and got a blog connected to the old Digineer website. It was some time later that I ended up heading out on my own for my blog, including getting my own domain, with a lot of great advice from Jason Strate. Jason pointed me to, among other things, Brent Ozar’s series about blogging. That was really helpful in getting going. (Editor note: here is a link to his advice on his blog a few years back http://www.brentozar.com/archive/2008/12/how-start-blog/)

    2. We all have influencers that have advanced our careers as writers. It may be a teacher who told you that you had great potential? Another writer who impressed you that you wanted to be like? Or perhaps on the other end of the spectrum it was a teacher who told you that you were too stupid to write well enough to spell your own name, much less have people one day impressed with your writing? Who were your influences that stand out as essential parts of your journey to the level of writer you have become? 

    I had a teacher in high school for English, and also for Creative Writing, named Richard White. He taught me the power of verbs, the importance of dialog, and reinforced the old writing axiom, “Show; don’t tell.” While these three lessons were aimed at fiction, I try to keep them in mind in my technical writing, as well. I try to make my writing sound like I am just speaking. To me, in essence, a blog post is a presentation I only have to give once; a presentation that keeps on presenting, if you will.

    I have also been fortunate enough to have an unofficial blogging mentor: Jason Strate. While he was working to take his own blogging to the next level, he was constantly sharing lessons learned with me. Whether it be a new tool he had tried or just even a bit of blogging philosophy, he set a great example. Many thanks to Jason.

    3.  Can you describe the process you go through to write (including any tools you find indispensable), from inception of an idea until it gets put out for consumption? 

    As far as tools go, Snag-It is the best thing ever. I love that application, and not just because my laptop lacks a “Prt Scn” key. It is so easy to capture screen shots of just about anything and apply highlighting, arrows, shapes, just about anything. I rely on it heavily for images I use in my posts, and sometimes presentations as well.

    Although it is not really a tool, I have to say the Flickr is an AWESOME place to get images for blogs and presentations. Jason Strate showed me that one several years ago. Just filter your search for Creative Commons content and provide links to the source for the images you use and you are off to the races. I have found so many great images out there.

    As far as process, I don’t know that I have one. But I think I can nail down some actions that I tend to take during the creation of many of my posts.

    • Find/Create a fun dataset.

      • The people that created the AdventureWorks database worked very hard to do so and provided examples of a lot of different things in the process. They deserve our gratitude. However, I try very hard to avoid having my blog posts and presentations be about selling bikes and accessories. If you look over my posts, you will see data examples relating to Super Heroes, The Smurfs, Romeo and Juliet… Keeping the datasets fun is part of what makes it fun for me.
    • If I am demonstrating how to perform some set of actions, I make sure to number the steps as well as the Figures (screenshots, etc) used. Then, I truly document every step along with the expected outcome of each. That takes time. And I am OK with that. When one considers how long a blog post will be “out there” after it is posted, taking the time to make it solid and clear is well worth it. It drives me nuts when documentation skips some steps in the middle of a process while assuming you just “know” to do them.
    • When screenshots are not appropriate to the topic, I make sure to find some fun pictures from Flickr to use. I make sure to choose images that are loosely related in some way to what I am writing about, but a bit entertaining as well. A picture of Devil’s Tower makes perfect sense in a post about ones experiences at a SQL Saturday in Chicago. And a 1960s era Ladies restroom sign is the ultimate homage to the Women in Technology Luncheon at the PASS Summit.

    4. Assume a time machine has been created, and you are allowed to go back in time to speak to a group of potential writers, in which you and I are in attendance. What would you tell "past us", and do you think that your advice would change where you and I are in our careers now? <like would you tell yourself that one day you would be sitting here for a rather long period of time answering interview questions and not getting paid for it, instead of…?>

    The best advice I could give “past us” is the same advice I give people who tell me they want to blog but are apprehensive.

    • Don’t be afraid to blog because you feel you don’t know enough. If everyone waited until they knew everything before blogging or presenting, we would have ZERO bloggers and presenters. Blog now; learn while you do it.
    • Don’t be afraid to blog about topics that others have already covered. People learn in different ways. While the topic may be the same, YOUR way of explaining may be exactly what someone needs for that AHA moment that has been eluding them.
    • Don’t be afraid of making a factually incorrect statement and getting called out on it. Mistakes happen. Do your best to verify what you are writing (you will learn a lot during this activity) and you will be fine. If you think a particular statement is true, but are unsure, say so and indicate why. Be honest about what you are writing and people will respect that.

    5. Finally, beyond the "how" questions, now the big one. There are no doubt tremendous pulls on your time. Why do you do write?

    I have a few different answers to this one.

    Before joining Digineer, I worked in general IT. Our department had a purple binder entitled, “Learned the Hard Way – or I don’t EVER want to have to figure this out again.” In that binder we placed really obscure problems we encountered along with their solutions. These were issues that happened so infrequently that remembering the details between occurrences was just not going to happen. Sometime I use my blog as my purple binder.

    I went to college with the intention of becoming a high school English Teacher. I got as far as student-teaching in a local middle school and even substituted a few times. When I discovered Theater, I ended up putting my main focus into acting. Even so, I am still a teacher at heart and LOVE sharing knowledge with other people. Blogging and presenting are an extension of teaching, as far as I am concerned.

    There is a poem commonly attributed to Ralph Waldo Emerson (although there is a bit of controversy about that) that really sums up why I do most things. I have loved this poem since high school and try to keep true to its meaning.

    Success 

    To laugh often and much; 
    to win the respect of intelligent people  
       and the affection of children; 
    to earn the appreciation of honest critics  
       and endure the betrayal of false friends; 
    to appreciate beauty; to find the best in others; 
    to leave the world a bit better, 
       whether by a healthy child, 
       a garden patch 
       or a redeemed social condition; 
    to know even one life has breathed easier 
       because you have lived. 
    This is to have succeeded.

    Blogging is one of the ways that I work toward achieving Success.

    -------------------------------------------

    Wow, that was quite an interview, chocked full of good advice, and something most blogs about technical writing will not have…controversially attributed poetry. Some of my favorite bits include noting that blogging/writing is a great way to learn, and you don’t need to be perfect to start. I find that the research I do to try to avoid being wrong makes working hours on a seemingly simple topic often well worth it when I am working during the day (during the getting paid part of the day!). And you don’t have to be perfect, as long as you try to get it right, are somewhat interesting and provide something for readers to learn (and remember, there are many levels of readers out there). When you are wrong, a reader or two will tell you… I promise. (Thick skin is very helpful for public writing!)

    So far, my biggest surprise has been that I haven’t gotten a particular answer to the time machine question. Stay tuned, someone soon is bound to answer what I have expected (and then I will add a supplementary entry to admit to the answers to the questions I would give myself!)

    To the focus of the series, I now have three reasons why my first two interviewees write:

    1. Because there are words that have to be written

    2. Keep up with stuff I know

    3. Working towards success

    The second answer is definitely high on my list, but it certainly isn’t quite enough to keep me typing on this keyboard week in and week out in my free time (when minimally I could be building something with my Legos and preserving the springiness of the keys on my keyboard.)  So the quest continues, with my next subject Rob Farley, who will hopefully get us one step closer to the answer to the question of why we write.

  • Why We Write #1 - An Interview With Thomas LaRock

    I 've been a writer of trade level technical materials for over 13 years now, writing books, articles, blogs, and even tweets for a variety of outlets, almost exclusively about Microsoft SQL Server. While I won't claim to be the best writer in the world, I feel like I have the process of writing down fairly well, yet, for the life of me, there is still the question of "why do I do this?" stuck in the back of my mind that I have yet to appease.

    Note that my quest specifically deals with non-verbal communication, because it seems to me that presentations are a completely different sort of "why" altogether.

    So I have decided to survey as many of my technical writing colleagues and find out their answer to the "why" question. The only criteria for being included in this set is that you write about a subject like programming, gadgets, computer administration, etc.; and that you don't make your most of your living from writing (in other words, if you stopped writing today, tomorrow you would not be in fear of sleeping in the gutter.) 

    To get the process started, I have asked Thomas LaRock to be my first survey participant. Tom is a SQL Server MVP, has written a very popular book called DBA Survivor for Apress, frequently tweets as @sqlrockstar, and blogs at www.thomaslarock.com where he maintains a popular ranked list of SQL bloggers (of which I am listed in the tempdb category).  He is a member of the executive committee of SQL PASS, and is very active in the SQL community as a speaker. He currently works for  Confio as a Technical Evangelist. Tom is also quite well known in our SQL communitiy as a lover of the delightful cured porcine meat known as bacon.

    If you want to see Tom in person, he will be doing a pre-conference seminar with Grant Fritchey and Dandy Weyn this year at Tech-Ed North America in early June in New Orleans entitled How to Be a Successful DBA in the Changing World of Cloud and On-Premise Data.

    -------------------------------------

      1. Every good superhero (or in your case, SQL Rockstar) has an origin story. What got you involved in writing?

      Tom: The birth of my daughter. I wanted to record as many details as possible and since I had 10MB of available space for a website as part of my cable package (yeah...10 MEGABYTES BABY!) it was easy enough to get a website up quickly and easily. The writing came easily, too, since I was writing about something so close to my heart, something I remain passionate about to this day.

          2. We all have influencers that have advanced our careers as writers. It may be a teacher who told you that you had great potential? Another writer who impressed you that you wanted to be like? Or perhaps on the other end of the spectrum it was a teacher who told you that you were too stupid to write well enough to spell your own name, much less have people one day impressed with your writing? Who were your influences that stand out as essential parts of your journey to the level of writer you have become? 

          Tom: I never try to be exactly like someone else. If I did then I would always be second best. Instead I've learned to take bits and pieces of different people and shape them into who I am today. The writer I admire most these days is Bill Simmons followed by Gregg Easterbrook. Both are known more for their sports writing but their style of writing is one that I try my best to emulate: it's human. I do not enjoy the dryness of technical writing, I prefer to write from my heart about things that I enjoy. That makes it less of a chore.

            3. My writing process is pretty drawn out, often starting on my phone in OneNote, sometimes finishing in 10 minutes, but often taking a year (or years) to finish an idea. Can you describe the process you go through to write, from inception of an idea until it gets put out for consumption? 

            Tom: I used to start a draft inside of WordPress but lately I have been using EverNote to track my ideas and take notes. From there I just decide to go and get it done. I do my best to follow a very loose format: describe a problem, explain why it's an issue, help readers understand any and all tradeoff (cost, benefits, risks), and a few action items for them to use as a take away. Once I have that framework in my head it doesn't take long to get to a finished product. I think I may spend more time on finding a decent image to use with my post than the actual writing itself.

              4. Assume a time machine has been created, and you are allowed to go back in time to speak to a group of potential writers, in which you and I are in attendance. What would you tell "past us", and do you think that your advice would change where you and I are in our careers now? 

              Tom: Write for yourself first. Feed your own soul. Don't worry about what your readers want. You can't write for others, they will never be happy with what you have done. The only person that needs to be happy with your words is you. When you write and share yourself then your readership will grow with people who are naturally drawn to you, and it makes it easier for you to keep sharing your words with people that want to hear them. And no, this advice wouldn't change. Ever.

                5. Finally, beyond the "how" questions, now the big one. There are only 24 hours in a day, and there are no doubt tremendous pulls on your time from family, friends, and pork products, yet, even considering just your blog output, you obviously sit down at a keyboard very often to write. Why?

                Tom: Most of the time I just feel that I have words that need to be written. Doing so helps to feed my soul. I'm at a keyboard a lot because my job requires it, and I am able to spend a lot of my day just writing as a way to communicate with others. Sometimes it's an email, sometimes it's a support ticket, other times it's a blog post.

                -------------------------------------

                I want to thank Tom for being my first participant in my experiment. I find his answer to the “why” question very similar to mine, in that he doesn’t so much offer a tangible reason, but more that he feels compelled to do so. I have to say that the question of how he got started is really quite unexpected, and very interesting, and is going to affect my future questions I ask because more than just the origin story, it will be interesting to see whether people started writing technically first, or for some other reason. I know that before I wrote my first book, I had never written 2 pages of material that wasn’t graded rather harshly by someone with PhD behind their name (or at least one of their low paid minions.)

                Unfortunately (or fortunately if you enjoyed this first entry) Tom certainly did not resolve any of my questions to any level of satisfaction so I am going to have to continue to ask more of my technical writer colleagues for their opinion as well.

                To that end, my next interviewee will be Mark Vaillancourt, whose website is http://markvsql.com/ and whom has a degree in English and Theatre (so he will know if it should have been whom or who earlier in this probably run on sentence), so that could make for quite an interesting interview. Perhaps he may resolve my curiosity about how one can go from the seemingly non-technical to spending his time working on SQL Server Business Intelligence. I don’t know but I look forward to finding out.

              1. My New Year's Goals 2013

                So, I have completely given up on my new year starting on Jan 1 where blogging/writing/community is concerned. I love the holidays WAY too much, and I love football (of the American variety with the oblong ball, of course) WAY too much, and so, that is why this year I waited until after the Super Bowl to get to this point (this was supposed to be posted last week, but #1 on the list got in the way!).

                Last year, I promised to do some things in my resolutions, and because I failed on one (1. get the book finished quickly…it took forever due to unforeseen circumstances), and overly succeeded (time wise) on another (4. I did two pre-cons last year), and both of these took a lot out of me! This year, no book riding my back (yet), but I do have several projects on the way that I will announce later in a more grand way (very exciting stuff to me for sure).

                This year, my goal is to get my community involvement right! I love the community, but sometimes I feel like it is crushing me. I don't want to be one of those people who quit the community, because I do love it and with my daughter grown up now, I have a lot of time to work on it. I enjoy the community so much, and I consider the PASS Summit part of my holiday season, and the people at the Summit a bit like family. So my goals for the year are to keep involved, but to make sure it is reasonable. So I present my 10 things goals for this next year.

                   

                  1. Get healthy. Since I started this blog entry, I had a very minor (yet extremely painful) health issue that was a wake up call (hint, minimally I have to drink a lot more water!) I have spend far too much time writing and speaking (and worrying about writing and speaking) and not enough time working on what matters (and sleeping, need more sleeping too!)  I don't want to miss these two growing up and if I have to drop out of the community to make that happen, I will.
                  2. Do stuff that I want to do, the way I want to do it, (but better). While I have always wanted to be an entertaining and educational speaker (Tom LaRock and Karen Lopez are great examples of this kind of presenting,) my actual presenting style is far different…more straightforward teaching, heavily scripted and heavily practiced. When I get it right, it works, but I constantly try to do far too much in the time allotted, and my nerves can get the better of me when I go off script. (I used to say "as a speaker, I am a good writer," and this is why. As a writer, I can edit myself multiple times!)
                  3. Get in a writing rhythm. Last year, I finally got my What Counts for a DBA series flowing, and this year I am adding a series on SQL Server metadata to the mix to my Simple-Talk blog. I will also try to put up a few blogs about other SQL Server/Design/Professional Development stuff here on SQLBLog too. All of this is leading up to more books in the future…so getting back into a rhythm and trying out new material is very important to the process of writing books for me.
                  4. Stay working in the MSDN forums. I have recently gotten back and active in the Database Design Forum and I plan to keep up answering at least all of the questions I can in that forum and perhaps others.
                  5. Always put in Database Design sessions when I submit to speak. It is what I really love to talk about, and even the SQL Server metadata series is based somewhat on the idea of being able to figure out a design that has been implemented.
                  6. Get a good development session written/practiced. Last year I wrote a session on sequences that I think was pretty good (worst feedback was that I tried to do way too much!) but really didn't resonate with anyone yet. I also did a trigger session at PASS that I have heard was good content, but needs some organization. I am reworking that one into a session I am going to call "How to Write a DML Trigger" (in slight homage to the series of Goofy shorts where you are taught how to do something, but with less dog carnage), that will start at the beginning and work to the existence of a trigger that does some realistic task.
                  7. Speak online more. Speaking online is actually quite comforting for some reason to me. I don't like not being able to get crowd feedback, but at the same time, not expecting feedback keeps that one frown (or two or twenty) from getting in my head and keeps me on my practiced script. I am going to finally start hosting my own practice sessions sometime this year as well. So before I debut a session, I am going to do an online practice session; and before a big conference (SQL PASS or Devlink are my two typical examples) I am going to review the material on my own livemeeting connection with an hour or two warning via twitter.
                  8. Actually blog about devices. Yeah, I love devices and I am contemplating my first tablet purchase this year. I really need to talk about them more than just in tweets, as devices/gadgets are what allow me to have the lifestyle I have as a highly mobile telecommuter.
                  9. Volunteer with PASS in some capacity. I feel like I need to do more for the community than I have been, as such, I am going to volunteer for one or two committees. Last year I volunteered for the selection committee, and was the #1 vote getter who lost :). I may try again, and I have volunteered for another committee too. If I don't get in either one, so be it. I do have plenty to do, but I want to get a bit more involved, perhaps to some day run for the board again, once I feel like I am ready.
                  10. Something I am not yet ready to announce. (and if you are my employer, I am not leaving you, relax :)) But it is something exciting to me, and hopefully you too.

                Most of all, on average, have fun doing what I am doing. This is my hobby, and not currently my career. If I stopped blogging/writing/speaking my life would not change tremendously except that my Lego collection wouldn't stare at me longingly as I walk to my writing chair. At the same time, I would truly miss sitting here at my TableMate II destroying the keyboard on yet another laptop, and even more I would miss the people and experience of all of the conferences I get to attend.

                WP_20130217_002

                I won't lie, often this keyboard is a drag to look at, but just as often it is a true joy. As long as I feel like the focus of Pete Townshend's Guitar and Pen:

                  "When you take up a pencil and sharpen it up
                  When you're kicking the fence and still nothing will budge
                  When the words are immobile until you sit down
                  Never feel they're worth keeping, they're not easily found
                  Then you know in some strange, unexplainable way
                  You must really have something
                  Jumping, thumping, fighting, hiding away
                  Important to say"

                I am going to keep writing and speaking… I really just have to.

              2. One more reason to to understand query plans, not directly performance related

                One of the things that separates a good programmer from a great one is a firm understanding about what is going on inside the computer. For some programming languages, it is very obvious what is going on inside the computer because you are working at a very low level. For example, if you are a C/C++ programmer writing an OS, you will know a lot about the hardware as you will interact with it directly. As a .NET programmer you are more encapsulated from the hardware experience, making use of the .NET framework.

                None of the aforementioned programming languages comes anywhere close to the level of encapsulation that we SQL programmers work with.  When you execute a statement like:

                SELECT *
                FROM    Tablename

                A firestorm of code is executed to optimize your query, find the data on disk, fetch that data, format it for presentation, and then send it to the client. And this is the super dumbed down version.  SQL is a declarative language, where basically we format a question or task for the system to execute without telling it how.  It is my favorite type of language because all of the pushing bits around get tedious.  However, what is important for the professional SQL programmer is to have some understanding of what is going under the covers, understanding query plans, disk IO, CPU, etc. Not necessarily to the depth that Glenn Alan Berry (http://sqlserverperformance.wordpress.com/) does, but certainly a working knowledge.

                Performance is the obvious reason, since it is clearly valuable to be able to optimize a query, but sometimes it can come in handy to debug an issue you are having with a query. Today, I ran across an optimizer condition that, while perfectly understandable in functional terms, would have driven me closer to nuts if I hadn’t been able to read a query plan. The problem came in based on the number of rows returned, either it worked perfectly or it failed with an overflow condition. Each query seemingly touches the exact same rows in the table where the overflow data exists…or did it.

                The setup. The real query that the problem was discovered in was our data warehouse, and was a star schema configuration with 20+ joins. In the reproduction, I will use a simple table of numbers to serve as the primary table of the query.

                create table ExampleTable  -- It really doesn’t matter what this table has. The datevalue column will be used to
                                           -- join to the date table, that I will load from the
                (                          -- from the values I put in this table to make sure all data does exist
                    i int constraint PKExampleTable primary key,
                    dateValue date
                )

                ;with digits (i) as( --(The code for this comes from my snippet page: http://www.drsql.org/Pages/Snippets.aspx).
                                    select 1 as i union all select 2 as i union all select 3 union all 
                                    select 4 union all select 5 union all select 6 union all select 7 union all
                                    select 8 union all select 9 union all select 0)
                ,sequence (i) as ( 
                                    select D1.i + (10*D2.i) + (100*D3.i) + (1000*D4.i) + (10000*D5.i) + (100000*D6.i) 
                                    from digits as D1, digits AS D2, digits AS D3 ,digits AS D4, digits as D5, digits As D6
                )
                insert into ExampleTable(i, dateValue)
                select i, dateadd(day, i % 10,getdate()) -- Puts in 10 different date values
                from sequence
                where i > 0 and i < 1000
                order by i

                Next I will load the date table with all of the distinct dateValue values that we loaded into the ExampleTable, plus one, which is the max date value for the datatype. In the “real” world case, this is one of our surrogate null values we use to indicate that it is the end date. (Yes, we are ignoring the Y10K problem.)

                create table date
                (
                    datevalue date constraint PKDate primary key
                )

                insert into date
                select distinct dateValue
                from   ExampleTable
                union all
                select '99991231'
                go

                In the typical usage, the number of rows is quite small.  In our queries, we are adding 1 to the dateValue to establish a range of a day (in the real query it was actually a month). Executing the following query that returns 99 rows is successful:

                select *, dateadd(day,1,date.dateValue)
                from   ExampleTable
                         join date
                            on date.dateValue = ExampleTable.dateValue
                where  i < 100

                However, remove the where clause (causing the query to return 999 rows):

                select *, dateadd(day,1,date.dateValue)
                from   ExampleTable
                          join date
                             on date.dateValue = ExampleTable.dateValue

                And you will see that this results in an overflow condition...

                Msg 517, Level 16, State 3, Line 2
                Adding a value to a 'date' column caused an overflow.

                Hmmm, this could be one of those days where I don’t get a lot of sleep :).  Next up, I check the max date value that can be returned.

                --show that the top value that could be returned is < maxdate
                select max(date.dateValue)
                from   ExampleTable
                         join date
                            on date.dateValue = ExampleTable.dateValue

                At this point, I start feeling like I am going nuts. The value returned is 2013-01-30. So no data is actually returned that should be too large for our date column… So then I think, well, let's add one to that value and take the max:

                select max(date.dateValue), max(dateadd(day,1,date.dateValue))
                from   ExampleTable
                         join date
                            on date.dateValue = ExampleTable.dateValue

                This returns, mockingly:

                Msg 517, Level 16, State 3, Line 2
                Adding a value to a 'date' column caused an overflow.

                So, since it worked with fewer rows earlier. I decide to try lowering the number of rows again, this time using a derived table, and it DOESN’T error out, even though it is obvious (because I stacked the deck…data) that the same data is just repeated for the dateValue, particularly since we get the same max dateValue as we did earlier.

                select max(date.dateValue), max(dateadd(day,1,date.dateValue))
                from   (select top 100 * from ExampleTable order by i) as ExampleTable
                         join date
                            on date.dateValue = ExampleTable.dateValue
                 
                       
                Well, you are possibly thinking, this just doesn't make sense. It is how I felt too after trying to do the logic in my head. I will admit that I didn’t know about query plans I would have been completely lost. But alas, the answer was fairly easily located in the plan. Taking a look at the plan for the query version that returns 99 rows:

                select *, dateadd(day,1,date.dateValue)
                from   ExampleTable
                         join date
                            on date.dateValue = ExampleTable.dateValue
                where  i <= 100

                We get the following estimated plan:

                image

                In this plan, it uses a nested loops operator, which basically will do 100 seeks from the top input (the ExampleTable), for each row fetching the date value, and then calculating the scalar value (dateadd(day,1,date.dateValue) ) on the values that match in the plan. Since the 9999-12-31 date is never used, there is no overflow.

                However, when the number of rows in the when the size of the output reaches a certain tolerance (in this case 999 instead of 99) from the following query:

                select *, dateadd(day,1,date.dateValue)
                from   ExampleTable
                         join date
                            on date.dateValue = ExampleTable.dateValue

                We get a different plan, one that is causing us issues:

                image

                Instead of nested loops, it uses a Hash Match Join, which takes the entirety of the smaller table and builds an internal hash index (basically setting up buckets that can be scanned much faster than an entire table…in our case, probably just a single bucket), and then scan the other set checking to see if the row exists in the hash index.

                It is in the process of building the hash index that our query runs into trouble. Since the date table is so much smaller, it plans to build the hash index on that table, and pre-creates the scalar values as it is doing the scan, since there are 11 rows in the date table, rather than having to calculate the value 999 times if it did it after the join. When it adds a day to the 9999-12-31 date, it fails.

                I know, the question of how practical is this scenario is bound to arise. I won’t lie to you and suggest that it is likely to happen to you as it it did to me. However, the point of this blog isn’t that this one scenario is bound to happen to you, but rather that understanding how SQL Server executes queries will help to give you insight to fix problems with your system, mostly performance, but sometimes every esoteric issues that won't just leap out as being based on the query plan that was chosen. (For more reading on query plans, check out Grant Fritchey’s Simple-Talk book on query plans: http://www.amazon.com/Server-Execution-Plans-Grant-Fritchey/dp/1906434026).

                In the end, the fix to my problem was simple. Make sure that the value that has meaning in the table, but not in the query, was filtered out:

                select *, dateadd(day,1,date.dateValue)
                from   ExampleTable
                         join date
                            on date.dateValue = ExampleTable.dateValue
                               and date.dateValue < '9999-12-31'

                Note: a commenter noted that in some cases, excluding the offensive data using the ON criteria/WHERE clause may not solve the issue. This is very true, and really will be made evident in the plan. I would expect it to be more likely to be definitely excluded in the JOIN clause, but you really can't guarantee anything that the optimizer might do without changing the source data (or representing the source data using a derived table as):

                select *, dateadd(day,1,date.dateValue)
                from ExampleTable
                       join (select * from date where date.dateValue < '9999-12-31') as date
                            on date.dateValue = ExampleTable.dateValue      
                             

                Looking at the different variances to the plan you should be able to diagnose a "hidden" problem such as I have described by finding the flow of data and making sure that the filtering operation happens before the calculating of the scalar that causes the overflow error. This may harm performance in my query for even the more "ideal" case where it could have used indexes, so you may yet have more work to do...But this is what makes data programming fun, now isn't it?

              3. Pro SQL Server 2012 Practices Chapter 8: Release Management Review

                This past year, I contributed a chapter to an anthology book of best practices for working with SQL Server 2012 entitled Pro SQL Server 2012 Practices (http://www.apress.com/9781430247708). As authors, for publicity we decided to do summary reviews one another's chapters. There are lots of great technical sounding chapters, but when I picked, I picked a chapter that I hoped to help me learn more about a process that is not in my favorite normal design or coding techniques area. Of the parts of the software development process I despise, release management is definitely one of them. As an architect, my primary love in software development starts with design, and starts to really drop off during testing. And I certainly did learn more about the process… TJay Belt (https://twitter.com/tjaybelt) wrote his chapter on release management. (I should also divulge that I have been friends with TJay through SQL PASS for quite some time, along with many of the authors of the book too.)

                TJay does a great job of describing the process of release management, talking about the process he uses and even admitting mistakes he and his teams have made along the way as well. The focus of the chapter is very much from the point of view of the releasing DBA role in the process (most of the book is very DBA centric topics) and contains a lot of tremendously good advice about getting release management right starting with having great documentation and a rollback plan be able to restart or put a release on hold if things go awry. In addition, he covers many of the the topics around the entire process of coding/releasing software, including version control, proper (and quite reasonable) documentation, coding standards, and most of all a set of well-defined processes that all of the varied players in the process have agreed to and work on as a team.

                My favorite part of the chapter was the approximately four pages of thought provoking questions that should be covered when doing a release, ranging from understanding the databases that will be affected, capacity planning, tuning, standards, code, jobs, etc. etc. Great food for thought for defining or refining your own release process.

                Of course, one of the concerns of a book with lots of different topics is that you don't get tremendously deep coverage of any subject (and this is also true in my chapter on database design.) However, in some ways this liberates the writer from having to cover every detail and instead provide a thoughtful discussion of the overall release management process. This is very much a blessing, because every organization is different and already has some process in place already. Maybe your defined process is awesome or awful, but this chapter can help you think of ways to refine your process. You are left to find your own tools and processes to meet your company's needs, but what you get is quite thought provoking and will likely be useful whether this is your first time doing a release, or if it your hundredth.

              4. A wee bit exhausted… time to reenergize

                I admit it. I am tired and I have not blogged nearly enough. This has been a crazy year, with the book I finished writing, the pre-cons I have done (teaching is NOT my primary profession so I do a lot more prep than some others probably do), lots of training on Data Warehousing topics (from Ralph Kimball, Bob Becker, and Stacia Misner, to name three of the great teachers I have had), SQL Rally, SQL PASS, SQL Saturdays and I have gotten a lot more regular with my simple-talk blog as well… Add to this the fact that my daughter added a new grandchild to the family, and my mother has started to get so weak she is starting to fall down quite often (I am writing this blog entry from a spare bedroom at my mother-in-law’s house while my mom is in rehab!) and I am getting exhausted.

                Am I whining? Probably, but it is my blog! No, seriously I figure that occasionally you have to poke your head out from under the covers and write something and this is my something until after the New Year (other than posting a few already written and edited simple-talk blogs). I am on vacation from work for 2.5 weeks, and I don’t plan to do much with this laptop of mine for those two weeks unless the spirit hits me with an idea for a blog that I just have to write, but usually most of my blogs that have any technical or artistic merit take weeks to complete.  On the second of January, I hope to be back at it, analyzing my resolutions from last year, and making good on a few of them, particularly “Blog about my other (computer) love occasionally” and review some of the gadgets I have acquired as they pertain to doing my job as a writer/data architect. (Hint: My mother-in-law does not have Internet access, so some of the devices I have here are instrumental in my ability to work untethered for weeks on end.)

                So until next year, Merry Christmas, Happy Holidays, Happy New Year!  I hope your holidays are restful and fun.  I know part of mine will be because I intend to replicate this picture at least one or two more times next week, hopefully with a Turkey Leg in the hand that isn’t holding the camera taking the picture (all with my Windows Phone set on Battery Saver Mode, which delightfully turns off all syncing :)

                image

              5. PASS Precon Countdown… See some of you Monday, and others on Tuesday Night

                As I finish up the plans for Monday’s database design precon, I am getting pretty excited for the day. This is the third time I have done this precon, and where the base slides are very similar, I have a few new twists in mind. One of my big ideas for my Database Design Workshop precon has always been to give people to do some design. So I am even now trying to go through and whittle down the slides and make sure that we have the time for design.

                If you are attending, be prepared to be a team player. I have 3 team exercise that you will do in teams. When we reach the first exercise, we will break up into 8 individual teams. Don’t try to figure out who to sit by, because I am going to randomly choose how to split up into teams when I see how the tables are (and I know that there will be at least one person there that I would want on my team :). The teams will be slightly important  because the most enthusiastic teams will get the first crack at the pile of swag, of which I have a lot. I have 20 physical and 15 ebooks of my new database design book, 15 8GB SD cards with the PowerPoint and code on them, 3 Joe Celko books, the Apress Applied Mathematics for Database Professionals book and a very nice Lego set and if this blog entices more people to show up than I have giveaways, well, then I will pick up some gift cards to even out the swag.

                While the lecture will take up a lot of time, the exercises will be most fun part of the day. The exercises I have planned are of the following genre:

                1.  Modeling from requirements: Taking a set of written requirements and producing the initial database design (20 minutes)

                2.  Normalizing a set: Taking a flat file and normalizing it into a set (~20 minutes)

                3.  Applying what we have discussed: Taking a set of requirements and producing a normalized database design (45 minutes)

                The first two exercises, every team will have the same requirements, but the third will see me having 4 separate designs. So we will have 4 different designs to to discuss and review. I am bringing my camera along to use to display to the team’s work on the screen. After I print the requirement packs for the teams, I plan to go through and do my own design for comparison. It will be interesting to see how different each team’s design is, and to see what I might miss when I do the design. I am going to encourage people to go beyond the specific requirements and build the system they think will be awesome while meeting the requirements.

                If all works out, my hope is to do a series of blogs next year using the requirements and designs that we produce as a result. I also (finally remembered to) put a request on the slide that students could do one of a couple of design ideas and I would review them (yes, with plans to turn that into a blog someday too.)

                So hope to see you Monday… And if I don’t see you in the class Monday, see you Tuesday night when we do our annual Quiz Bowl. Tim has come up with a slew of interesting questions including another round of Before and After questions to blow the mind of several SQL Server professionals…

              6. 24 Hours of PASS next week, pre-con preview style

                I will be doing my Characteristics of a Great Relational Database, which is a session that I haven’t done since last PASS. When I was asked about doing this Summit Preview version of 24 hours of PASS, I decided that I would do this session, largely because it is kind of light and fun, but also because it is either going to be the basis of the end section of my pre-con at the summit or it is going to be the section of the pre-con we don’t get to because we are so involved in working out designs that we forget the time and the next day’s precon people start arriving and kick us out.

                The session is basically a discussion about the finishing touches that make a database better than average, something you can rave about, something you can brag to your significant other about, something your company will run a Super Bowl ad just thanking you for… Well, ok, seriously, a database that won’t cause you and your coworkers to ralph each time you use it is a solid step towards the way you will want to develop your databases. 

                The goal is to be reasonably like at a little bit fun, since I am doing the presentation at 11:00 PM Central Time in the US, and well, that isn’t exactly prime SQL time for most people. In Europe it will be the middle of the night, and in half of the US I will be competing with the national news and the end of the football game between the New York Giants and Carolina Panthers. If the game is close, I will be happy to share your attention, and heck, if my internet connection would support streaming video and the sharing client I would probably be watching the game myself (as it is, I will probably TiVo it and watch it on my phone via SlingBox when we are done…yes, I have a little bit of a football problem.)

                If you want to attend my session, click here and register. Even if database design isn’t your thing, 24 hours of PASS has (hold on to your hat) 24 different sessions in a 24 hour period to choose from. So click on over to the 24HOP Speaker/Session list and pick your sessions and register for them. I look forward to seeing you (well your name in the list) at the event.

                But db design is your thing (or you want it to be!), and you want to get a full day dose on the Monday before PASS, try my pre-con on Relational Database Design. It is going to be a great day, there will be plenty of learning, lots of swag (including at least 30 copies of my book to give away,) and some practical experience doing a bit of team based design. In any case it will be better than a normal Monday at the office.

              7. Utility Queries–Structure of Tables with Identity Column

                Edit: At the suggestion of a much knowledgable commenter who shall remain named Aaron, I changed from using schema_name() function to using sys.tables. When writing code that is expected to have reuse, it can be safer to use the tables rather than functions because the tables will work in the context of the database that is in the from clause, so if you changed the code to database1.sys.tables because you wanted the tables from database1, and you were executing the code in database2, the columns of the table would give you the answer you expected, but the functions would be context of database2.

                I have been doing a presentation on sequences of late (last planned version of that presentation was last week, but should be able to get the gist of things from the slides and the code posted here on my presentation page), and as part of that process, I started writing some queries to interrogate the structure of tables. I started with tables using an identity column for some purpose because they are considerably easier to do than sequences, specifically because the limitations of identity columns make determining how they are used easier.

                In the future (which will probably be after PASS, since I have a lot of prep and 3 more presentations to do before PASS), I will start trying to discern the different cases where you might want to use a sequence and writing queries to make sure the table structures are as I desire. The queries presented here are really the first step in this direction, as in most cases I foresee a mixture of identity and sequence based surrogate keys even once people get to SQL Server 2012 as a typical set up. The queries I am presenting here will look for tables that meet certain conditions, including:

                • Tables with no primary key – Very common scenario, no idea about uniqueness, or sometimes that identity property alone makes the table an adequate table.
                • Tables with no identity column – Abolutely nothing wrong with this scenario, as the pattern of using an identity based primary key is just a choice\preference.  However, if you you expect all of your tables to have identity columns, running this query can show you where you are wrong.  I usually use this sort of query as part of a release, making sure that the tables I expected to have a surrogate actually do.
                • Tables with identity column and PK, identity column in AK – This query is interesting for looking at other people’s databases sometimes.  Not everyone uses the identity value as a surrogate primary key, and finding cases where it is in a non-key usage can help you find “interesting” cases.
                • Tables with an identity based column in the primary key along with other columns – In this case, the key columns are illogical. The identity value should always be unique and be a sufficient surrogate key on it's own.  By putting other columns in the key, you end up with a false sense of uniqueness. Ideally, you want your tables to have at least one key where all of the values are created outside of SQL Server. Sometimes people with use this for an invoice line item and make the pk the invoiceId and an identity value like invoiceLineItemId.

                  I can’t say that this is “wrong” but if the only key includes a system generated value, it means that you can have duplicated data along with the system generated value. So you need to monitor the data more carefully.
                • Tables with a single column identity based primary key but no alternate key. – This is the classic ‘bad’ use of surrogate key abuse. Just drop a surrogate key on the table and viola!, uniqueness. If you can’t see why this wouldn’t be the desirable case, it is like the previous case, except the only uniqueness criteria is a monotonically increasing value.

                You can download the code directly from here  or you can see all my downloadable queries on my downloadable package page: DownloadablePackages.

                The queries:

                --Tables with no primary key

                SELECT  schemas.name + '.' + tables.name AS tableName
                FROM    sys.tables
                          JOIN sys.schemas
                             ON tables.schema_id = schemas.schema_id
                WHERE   tables.type_desc = 'USER_TABLE'
                        --no PK key constraint exists
                    AND NOT EXISTS ( SELECT *
                                        FROM   sys.key_constraints
                                        WHERE  key_constraints.type = 'PK'
                                            AND key_constraints.parent_object_id = tables.object_id )


                --Tables with no identity column

                SELECT  schemas.name + '.' + tables.name AS tableName
                FROM    sys.tables
                           JOIN sys.schemas
                               ON tables.schema_id = schemas.schema_id
                WHERE   tables.type_desc = 'USER_TABLE'
                --no column in the table has the identity property
                    AND NOT EXISTS ( SELECT *
                                     FROM   sys.columns
                                     WHERE  tables.object_id = columns.object_id
                                       AND is_identity = 1 )

                --Tables with identity column and PK, identity column in AK

                SELECT schemas.name + '.' + tables.name AS tableName
                FROM   sys.tables
                        JOIN sys.schemas
                            ON tables.schema_id = schemas.schema_id
                WHERE tables.type_desc = 'USER_TABLE'
                        -- table does have identity column 
                  AND   EXISTS (    SELECT *
                                    FROM   sys.columns
                                    WHERE  tables.object_id = columns.object_id
                                        AND is_identity = 1 ) 
                        -- table does have primary key 
                  AND   EXISTS (    SELECT *
                                    FROM   sys.key_constraints
                                    WHERE  key_constraints.type = 'PK'
                                      AND key_constraints.parent_object_id = tables.object_id )
                        -- but it is not the PK 
                  AND   EXISTS (    SELECT *
                                    FROM   sys.key_constraints
                                        JOIN sys.index_columns
                                            ON index_columns.object_id = key_constraints.parent_object_id
                                                AND index_columns.index_id = key_constraints.unique_index_id
                                        JOIN sys.columns
                                            ON columns.object_id = index_columns.object_id
                                                AND columns.column_id = index_columns.column_id
                                    WHERE  key_constraints.type = 'UQ'
                                        AND key_constraints.parent_object_id = tables.object_id
                                        AND columns.is_identity = 1 )

                --Tables with an identity based column in the primary key along with other columns

                SELECT schemas.name + '.' + tables.name AS tableName
                FROM   sys.tables
                         JOIN sys.schemas
                            ON tables.schema_id = schemas.schema_id
                WHERE tables.type_desc = 'USER_TABLE'
                        -- table does have identity column
                  AND   EXISTS ( SELECT *
                                 FROM   sys.columns
                                 WHERE  tables.object_id = columns.object_id
                                   AND is_identity = 1 )
                        --any PK has identity column
                  AND   EXISTS( SELECT  *
                                FROM    sys.key_constraints
                                           JOIN sys.index_columns
                                                ON index_columns.object_id = key_constraints.parent_object_id
                                                   AND index_columns.index_id = key_constraints.unique_index_id
                                           JOIN sys.columns
                                                ON columns.object_id = index_columns.object_id
                                                   AND columns.column_id = index_columns.column_id
                                WHERE    key_constraints.type = 'PK'
                                  AND    key_constraints.parent_object_id = tables.object_id
                                  AND    columns.is_identity = 1 )
                    --and there are > 1 columns in the PK constraint
                    AND (  SELECT  COUNT(*)
                           FROM    sys.key_constraints
                                      JOIN sys.index_columns
                                          ON index_columns.object_id = key_constraints.parent_object_id
                                             AND index_columns.index_id = key_constraints.unique_index_id
                            WHERE   key_constraints.type = 'PK'
                              AND   key_constraints.parent_object_id = tables.object_id
                        ) > 1


                --Tables with a single column identity based primary key but no alternate key

                SELECT schemas.name + '.' + tables.name AS tableName
                FROM sys.tables
                         JOIN sys.schemas
                             ON tables.schema_id = schemas.schema_id
                WHERE tables.type_desc = 'USER_TABLE'
                        --a PK key constraint exists 
                  AND   EXISTS ( SELECT * 
                                 FROM   sys.key_constraints 
                                 WHERE  key_constraints.type = 'PK' 
                                   AND key_constraints.parent_object_id = tables.object_id )
                    --any PK only has identity column 
                  AND ( SELECT COUNT(*) 
                        FROM   sys.key_constraints 
                                  JOIN sys.index_columns 
                                      ON index_columns.object_id = key_constraints.parent_object_id 
                                         AND index_columns.index_id = key_constraints.unique_index_id 
                                  JOIN sys.columns 
                                      ON columns.object_id = index_columns.object_id 
                                         AND columns.column_id = index_columns.column_id 
                        WHERE  key_constraints.type = 'PK' 
                          AND  key_constraints.parent_object_id = tables.object_id 
                          AND columns.is_identity = 0
                        ) = 0 --must have > 0 columns in pkey, can only have 1 identity column 

                  --but no Unique Constraint Exists 
                  AND NOT EXISTS ( SELECT * 
                                   FROM   sys.key_constraints 
                                   WHERE  key_constraints.type = 'UQ' 
                                     AND key_constraints.parent_object_id = tables.object_id )
                  

                --Test Cases

                --The following are some sample tables that can be built to test these queries. If you have other ideas
                --for cases (or find errors, email louis@drsql.org)

                IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoPrimaryKey'))
                        DROP TABLE dbo.NoPrimaryKey;
                IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoIdentityColumn'))
                        DROP TABLE dbo.NoIdentityColumn;
                IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.IdentityButNotInPkey'))
                        DROP TABLE dbo.IdentityButNotInPkey;
                IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.TooManyColumnsInPkey'))
                        DROP TABLE dbo.TooManyColumnsInPkey;
                IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.MultipleColumnsInPkeyOk'))
                        DROP TABLE dbo.MultipleColumnsInPkeyOk;
                IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.NoAlternateKey'))
                        DROP TABLE dbo.NoAlternateKey;
                IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id('dbo.IdentityInAlternateKey'))
                        DROP TABLE dbo.IdentityInAlternateKey;

                --very common scenario, assuming identity makes the table great
                CREATE TABLE NoPrimaryKey
                (
                    NoPrimaryKeyId int not null identity,
                    AnotherColumnId int not null
                )
                go

                --absolutely nothing wrong with this scenario, unless you expect all of your
                --tables to have identity columns, of course...
                CREATE TABLE NoIdentityColumn
                (
                    NoIdentityColumnId int primary key,
                    AnotherColumnId int not null
                )
                go

                --absolutely nothing wrong with this scenario either, as this could be desired.
                --usually it is some form of mistake in a database using surrogate keys though
                CREATE TABLE IdentityButNotInPkey
                (
                    IdentityButNotInPkeyId int primary key,
                    AnotherColumnId int identity not null
                )
                go

                --absolutely nothing wrong with this scenario either, as this could be desired.
                --usually it is some form of mistake in a database using surrogate keys though
                CREATE TABLE IdentityInAlternateKey
                (
                    IdentityInAlternateKeyId int primary key,
                    AnotherColumnId int identity not null unique
                )
                go


                --In this case, the key columns are illogical. The identity value should always be unique and
                --be a sufficient primary surrogate key. I definitely want to know why this is built this
                --way.  Sometimes people with use this for an invoice line item and make the pk the
                --invoiceId and an identity value like invoiceLineItemId. I generally prefer the surrogate key
                --to stand alone and have the multi-part key to be something that makes sense for the user
                CREATE TABLE TooManyColumnsInSurrogatePkey
                (
                    TooManyColumnsInPkeyId int identity,
                    AnotherColumnId int,
                    primary key (TooManyColumnsInPkeyId,AnotherColumnId)
                )
                go

                CREATE TABLE MultipleColumnsInPkeyOk
                (
                    TooManyColumnsInPkeyId int not null,
                    AnotherColumnId int not null,
                    primary key (TooManyColumnsInPkeyId,AnotherColumnId)
                )
                go

                --this is my pet peeve, and something that should be avoided. You could end up having
                --duplicate rows that are not logical.
                CREATE TABLE NoAlternateKey
                (
                    NoAlternateKeyId int not null identity primary key,
                    AnotherColumnThatShouldBeUnique int not null
                )
                go

              8. SQLPASS DB Design Precon Preview

                It is just a few months left before SQLPASS and I am pushing to get my precon prepped for you. While it will be the second time I produce this on the year, I listened to the feedback and positive comments I have heard from potential attendees, so I am making a couple of big changes to fit what people really liked.

                1. Lots more design time. We will do more designs in some form, as a group, teams, and individually, depending on the room and people in attendance. (Figure on a lot of databases centered around toys, theme parks, and other situations that are not exactly serious, since they provide a limited enough case where no one will get hung up on how their company does it, but broad enough to mimic real business cases. )
                2. Pattern and code walkthroughs. I have a set of patterns (like uniqueness, hierarchies, and data driven design) that we can walk through and see how to translate from a database design to a physical implementation. It is based on the presentations I have done for the Data Architecture Virtual Chapter, and at Devlink this year, but we will not blast through any of it and will cover the code and designs in a deliberate pace and then consider designs where the pattern would make sense.

                So if you want an interactive experience where you get a chance to think for yourself (at least part of the time) come join me on November 5 in Seattle, Washington, don't think it, just do it: http://www.sqlpass.org/summit/2012/Registration.aspx (Not responsible if you actually get carried away and your employer won't cover your expenses)

                As a bonus, I have at least 30 books I will give away to attendees, 15 electronic and 15 ex-tree versions to share with you. The precon is largely taken from the book, but it would take me more than 7 hours to just read the book to, and I am afraid that would not impress anyone at all.

                If you want to see more of my thoughts on the pre-con, check out the interview PASS did for the pre-con here, it is wordy, but if it wasn't it would be mathematics: http://www.sqlpass.org/summit/2012/Sessions/PreConferenceSessions/PreConPreviews/LouisDavidson.aspx

                If you have any questions to ask me about what we will cover, or what you want to cover, or want to know I think I am funny, don't hesitate to go ahead and register…send me an email to drsql@hotmail.com.

              9. And interview, an online session, a long drive and a SQL Saturday… This week!

                Later this week I will be doing an episode of the Greg Low’s excellent SQL Down Under podcast (http://www.sqldownunder.com/Resources/Podcast.aspx), something I did once before back in 2006.  If you haven’t listened to any of the previous editions, there are some amazing people who have been on his podcast.

                On Thursday at 12:00 Central Time, I will be doing a presentation entitled Designing for Common Problems in SQL Server for the PASS Data Architecture Virtual Chapter.

                Friday I will be driving up to Cleveland, OH for SQL Saturday 164. I will be doing the Designing for Common Problems in SQL Server session, along with the Sequences session that I have done at several SQL Saturdays so far.  Saturday I will give away two copies of my brand new book, one in each session, so if you want to be the first person I give one to, be there!

                Right now, the biggest issue is that the Designing for Common Problems session is WAY too long. In my prep so far, I have gotten halfway through with the patterns and code in one and a half hours. So who knows what I will do to cut down the time, either limit the patterns, or perhaps split the session? I will figure something out… at least on Saturday when I have real people I can poll the audience to see what they want to see in detail. Online pretty much all you see are people’s names and the clock ticking away.

                I have a few other things coming up, including picking speakers for Nashville’s SQL Saturday, shipping out books to my SQL Rally attendees, and Devlink at the end of the month (when I will have a bit longer to the Common Problems session, thankfully), but more on that after this weekend.

              10. Louisville SQL Saturday…

                One more week until we get to SQL Saturday 122 in Louisville KY. I have a couple of sessions on the calendar this time. First, the one on Sequences:

                ------------------------------------

                What Sequence objects are (and are not)

                SQL Server 2012 adds a new object to our arsenal called a sequence that can will give us the capability to implement automatically incrementing values. However, it cannot replace a lot of functionality that we have used a numbers table and windowing functions for (though they can be complimentary to one another). In this session I will demonstrate the uses and performance characteristics of sequences, including how they compliment the use of number tables and windowing functions to create surrogate key and sorting values, and more.

                ------------------------------------

                The second session is my professional development session that goes along with my What Counts for a DBA blog series on SimpleTalk. Come with your ideas about what makes a great DBA so we can all get into the conversation (and not just so I won’t have to create too many slides). I will have my topic spinning wheel with me, so who knows exactly what we will discuss, not even I know.

                ------------------------------------
                What Counts For a DBA

                The world of a DBA can be daunting for a person, either as a new or old, because not only do they need to keep up with new and emerging technologies, but also with the code and designs of their coworkers. In this highly participation driven session, we will employ a random topic chooser to pick several of these traits for discussion as a group. Possible topics include past blog topics such as Logic, Curiosity, Failure, Humility, Skill and Passion, as well as any other topics that might be added for that day. So come prepared to participate and voice your opinion about what counts for a DBA.

                ------------------------------------

                Hope to see you there! (and if you can’t make it to Louisville, I will be in Cleveland OH for SQL Saturday #164 on August 16,  and then in Chattanooga for Devlink the last week of August. Chattanooga, TN.

              11. Utility Queries–Database Files, (and Filegroups)

                It has been a while since I last posted a utility query, and today, to avoid other work I am supposed to be doing, I decided to go ahead and work on another post.  Today, I went ahead and worked on a server configuration type query. One query I find I use pretty often is the following one that lists the files in the database. In this blog I will include 3 queries.  The first will deal with files and databases, and the second runs in a database to see the files and their filegroups (If there is an easy way to get the filegroups at a server level, I am not sure of it…let me know).

                Database Files, All Databases – File Level (Download source)

                It is a pretty simple query, and it returns the following columns. (A value of '@TOTAL' indicates that the row is a summary row, and some file_types will not report a file size. ):

                • database_name – The name of the database
                • database_file_name – The file name that was set when the file was added to the database (the logical name, not the physical name)
                • size_in_kb – The size of the file in kilobytes, such that it matches the file size in the Windows Explorer
                • size_in_mb – The size of the file in megabytes, a size that is more typical the people want to see
                • size_in_gb – The size of the file in gigabytes, useful when looking at really large files
                • file_type – How the file is used in the server
                • filesystem_drive_letter – the drive letter where the file is located
                • filesystem_file_name – name of the physical file
                • filesystem_path – the path where the files are located.

                --Get the files and total size of files for all databases

                SELECT  --the name of the database
                        CASE WHEN GROUPING(DB_NAME(database_id)) = 1 THEN '@TOTAL'
                             ELSE DB_NAME(database_id)
                        END AS database_name ,

                        --the logical name of the file
                        CASE WHEN GROUPING(master_files.name) = 1 THEN '@TOTAL'
                             ELSE master_files.name
                        END AS database_file_name ,

                        --the size of the file is stored in # of pages
                        SUM(master_files.size * 8.0) AS size_in_kb,
                        SUM(master_files.size * 8.0) / 1024.0 AS size_in_mb,
                        SUM(master_files.size * 8.0) / 1024.0 / 1024.0 AS size_in_gb,

                        --the physical filename only
                        CASE WHEN GROUPING(master_files.name) = 1 THEN ''
                             ELSE MAX(master_files.type_desc)
                        END AS file_type , 
                       
                        --the physical filename only
                        CASE WHEN GROUPING(master_files.name) = 1 THEN ''
                             ELSE MAX(UPPER(SUBSTRING(master_files.physical_name, 1, 1)))
                        END AS filesystem_drive_letter ,              


                       --thanks to Phillip Kelley from http://stackoverflow.com/questions/1024978/find-index-of-last-occurrence-of-a-sub-string-using-t-sql
                       --for the REVERSE code to get the filename and path.

                        --the physical filename only
                        CASE WHEN GROUPING(master_files.name) = 1 THEN ''
                             ELSE MAX(REVERSE(LEFT(REVERSE(master_files.physical_name),
                                     CHARINDEX('\', REVERSE(physical_name)) - 1)))
                        END AS filesystem_file_name ,

                        --the path of the file only
                       cASE WHEN GROUPING(master_files.name) = 1 THEN ''
                             ELSE MAX(REPLACE(master_files.physical_name,
                                REVERSE(LEFT(REVERSE(master_files.physical_name),
                                             CHARINDEX('\', REVERSE(physical_name)) - 1)), ''))
                             END AS filesystem_path

                FROM    sys.master_files
                GROUP BY DB_NAME(database_id) , --the database and filegroup and the file (all of the parts)
                         master_files.name WITH rollup
                ORDER BY database_name, database_file_name

                Single Database By Filegroup (Download source)

                Edited: Added code based on one of the comments here: http://www.sqlblog.lv/2011/05/ka-apskatit-datu-bazes-failu-izmeru-un.html.  His post does all db’s with sizing, but I preferred to have this query only work on one database. I added columns for available space, used space, as well as on disk space

                In the second query, it will, for one database, list all of the row and log filegroups and their files. Like the previous query, it may list filegroups that have a 0 size for types like full text. It uses sys.database_files for the files. This has one downside, and that is that if the database is read only, it is possible that the results will not be correct and will reflect a previous version of the metadata. Use master_files if you want to get current values, but there is no guarantees that it will match the filegroups. 

                It will return:

                • filegroup_name – The name of the filegroup in the database
                • database_file_name – The file name that was set when the file was added to the database (the logical name, not the physical name)
                • size_in_kb – The size of the file in kilobytes, such that it matches the file size in the Windows Explorer
                • size_in_mb – The size of the file in megabytes, a size that is more typical the people want to see (Commented Out)
                • size_in_gb – The size of the file in gigabytes, useful when looking at really large files
                • used_size_in_kb – The amount of the file that has data allocated, in kilobytes
                • used_size_in_mb – The amount of the file that has data allocated in megabytes, a size that is more typical the people want to see (Commented Out)
                • used_size_in_gb – The amount of the file that has data allocated, in gigabytes, useful when looking at really large files
                • available_size_in_kb – The amount of free space in kilobytes, such that it matches the file size in the Windows Explorer
                • available_size_in_mb – The amount of free space in megabytes, a size that is more typical the people want to see (Commented Out)
                • available_size_in_gb – The amount of free space in gigabytes, useful when looking at really large files
                • size_on_disk_kb – The amount of space the file takes in the file system (reported from the DMVs)
                • file_type – How the file is used in the server
                • filesystem_drive_letter – the drive letter where the file is located
                • filesystem_file_name – name of the physical file
                • filesystem_path – the path where the files are located.

                SELECT  --the name of the database

                               --the name of the filegroup (or Log for the log file, which doesn't have a filegroup)
                               CASE WHEN GROUPING(filegroups.name) = 1 THEN '@TOTAL'
                                         WHEN filegroups.name IS NULL THEN 'LOGS'
                                         ELSE filegroups.name
                                END AS filegroup_name ,
                       
                               --the logical name of the file
                               CASE WHEN GROUPING(database_files.name) = 1 THEN '@TOTAL'
                                        ELSE database_files.name
                               END AS database_file_name ,

                               --the size of the file is stored in # of pages
                               SUM(database_files.size * 8.0) AS size_in_kb,
                               --SUM(database_files.size * 8.0) / 1024.0 AS size_in_mb,
                               SUM(database_files.size * 8.0) / 1024.0 / 1024.0 AS size_in_gb,
                              
                               SUM(FILEPROPERTY(database_files.NAME,'SpaceUsed') * 8.0) AS used_size_in_kb,
                               --SUM(FILEPROPERTY(database_files.NAME,'SpaceUsed') * 8.0)/ 1024.0  AS used_size_in_mb,
                               SUM(FILEPROPERTY(database_files.NAME,'SpaceUsed') * 8.0) / 1024.0 / 1024.0 AS used_size_in_gb,                             

                               SUM((database_files.size - FILEPROPERTY(database_files.NAME,'SpaceUsed')) * 8.0) AS available_size_in_kb,
                               --SUM((database_files.size - FILEPROPERTY(database_files.NAME,'SpaceUsed')) * 8.0)/ 1024.0  AS available_size_in_mb,
                               SUM((database_files.size - FILEPROPERTY(database_files.NAME,'SpaceUsed')) * 8.0) / 1024.0 / 1024.0 AS available_size_in_gb,  

                               SUM(DIVFS.size_on_disk_bytes/1024.0) AS size_on_disk_kb,
                              
                              --the physical filename only
                              CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                                        ELSE MAX(database_files.type_desc)
                               END AS file_type , 

                               --the physical filename only
                               CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                                        ELSE MAX(UPPER(SUBSTRING(database_files.physical_name, 1, 1)))
                               END AS filesystem_drive_letter ,        
                       
                              --thanks to Phillip Kelley from http://stackoverflow.com/questions/1024978/find-index-of-last-occurrence-of-a-sub-string-using-t-sql

                               --the physical filename only
                               CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                                         ELSE MAX(REVERSE(LEFT(REVERSE(database_files.physical_name), CHARINDEX('\', REVERSE(database_files.physical_name)) - 1)))
                                END AS filesystem_file_name ,

                                --the path of the file only
                               CASE WHEN GROUPING(database_files.name) = 1 THEN ''
                                         ELSE MAX(REPLACE(database_files.physical_name, REVERSE(LEFT(REVERSE(database_files.physical_name),
                                                                                                                                   CHARINDEX('\', REVERSE(database_files.physical_name)) - 1)), ''))
                                END AS filesystem_path
                FROM    sys.database_files --use sys.master_files if the database is read only and you want to see the metadata that is the database
                             --log files do not have a filegroup
                                     LEFT OUTER JOIN sys.filegroups
                                             ON database_files.data_space_id = filegroups.data_space_id
                                    Left Join sys.dm_io_virtual_file_stats(DB_ID(), DEFAULT) DIVFS
                                            On database_files.file_id = DIVFS.file_id                        
                GROUP BY  filegroups.name ,
                                 database_files.name WITH ROLLUP
                ORDER BY     --the name of the filegroup (or Log for the log file, which doesn't have a filegroup)
                                 CASE WHEN GROUPING(filegroups.name) = 1 THEN '@TOTAL'
                                          WHEN filegroups.name IS NULL THEN '@TOTAL-SortAfter'
                                          ELSE filegroups.name
                                  END,
                                  database_file_name

                Hope these queries help out sometime.  More on the way as I finish up other projects!

              12. Speaking at PASS 2012… Exciting and Scary… As usual…

                Edit: As I reread this, I felt I should clarify.. As usual refers mostly to the "Scary" part. I have a lot of stage fright that I have to work through. And it is always exciting to be picked.  

                I have been selected this year at the PASS Summit 2012 to do two sessions, and they are both going to be interesting.

                • Pre-Con: Relational Database Design Workshop - Abstract
                • Triggers: Born Evil or Misunderstood? - Abstract

                The pre-con session entitled Relational Database Design Workshop will be (at least) the third time I will have done this pre-con session, and I am pretty excited to take it to a bit larger scale. The one big change that I am forcing this time is a limit on the lecture time. Each of the first two times I have done this session the biggest disappointment has been that we didn't get nearly enough time for the exercises. If people get tired of the exercises, I will certainly have a lot of extra material to do, but the focus will be on getting at least three hours of design time in. Some as a full group on the screen, and some broken up into groups. (Of course, we will adjust the schedule based on the size of the group and whatever they are throwing at me verbally… and physically! I will have material to pad out at least an hour if people start looking bored (or if the group members start screaming at each other…you know, like a real database design session would be like if people weren't concerned with losing their jobs.))

                The triggers session is the one that I have been mulling over for years now, and it is going to be minimally interesting, and probably a lot of fun (particularly if Hugo Kornelis (@Hugo_Kornelis) and Tom LaRock (@SQLRockstar) (Tom is not a fan of triggers! http://thomaslarock.com/2009/03/sql-database-triggers/) show up to offer their opinions). Triggers are probably the most controversial of SQL Server objects, and for good reason. There are server and database settings that affect how they work, and it is not trivial to write them in a manner that doesn't harm performance. Worst yet, they can cause side effects that (if not performed correctly) really harm performance, data integrity, and the sanity of the developers who don't realize they exist. But for all of the negatives, there are some tasks that just fit the trigger to a T. I might be trying to do too much in a 1:15 session, but it would not be the first time!

                So I hope to see a good number of you there, for the pre- con, and certainly for the Trigger session. Just leave the potential projectiles in your hotel room...

              This Blog

              Syndication

              Links to my other sites

              Powered by Community Server (Commercial Edition), by Telligent Systems
                Privacy Statement