THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Merrill Aldrich

  • Execute T-SQL Across Listed Servers/Databases

    Here’s a handy trick – if you have a SQL script, stored as a text file, and need to run it against an arbitrary list of different databases on different SQL Server instances, try a little PowerShell. There are a lot of ways to accomplish this task, but personally I like this one for its flexibility.

    First, make sure the SQL script does NOT include the common USE <database>. Generally that statement is your friend, but not in cases where a single script should work against multiple databases having different names.

    Next, make a “collection of collections” in PowerShell, which will act as a list of instances and databases on those instances. Example:

    $instances = @(
        @( 'Server1', 'someDatabase' ),
        @( 'Server2', 'AnotherDatabase' ),
        @( 'Server32', 'FooDB' )
        # Repeat
    )

    Each item in $instances is a mini-collection, and each of those collections has two elements: an instance name and a database name, which will be available using indexes [0] and [1].

    If you prefer, it’s also possible to pull these values from a text file using Get-Content, and split each line on some delimiter character.

    Next, load the content of your SQL script file into a variable. There’s an important caveat here: we have to load it as a single string, not an array of strings. The default behavior of Get-Content, though, is to split a file on line terminators and make each line into a separate object in a collection of strings.

    There are a few ways to accomplish this, but I learned a simple one in a forum from Alex K. Angelopoulos – we can direct Get-Content to split on a character that doesn’t actually exist in the file. This makes the whole content of the file “look like” one line to Get-Content. He suggests the Null character, which in PowerShell is `0 (back tick zero). Note that the Null character is not the same or related to $null.

    Special Characters: http://technet.microsoft.com/en-us/library/dd347558.aspx 

    $deployScript = Get-Content 'C:\Path\To\Your\Script.sql' –Delimiter `0

    Finally, loop over the list of ( instances + databases ) to run the script everywhere. Be cautious and test!

    $instances | ForEach {
    
        # This should print the instance and database from your list, which is handy for troubleshooting:
        Write-Host $_[0] $_[1] 
    
        # This sort of thing can be used to validate that you are connecting to the right databases:
        Invoke-SqlCmd –ServerInstance $_[0] –Database $_[1] –Query 'Select @@servername, db_name()'
    
        # Finally, this would execute the script:
        Invoke-SqlCmd –ServerInstance $_[0] –Database $_[1] –Query $deployScript
    
    }

    There is no Undo, so test. Use your powers only for good!

    Edit 5/5/2012 – fixed missing commas in code sample.

  • Poor Man’s PowerShell TFS SSMS Integration

    This is lame. Still, here goes: I need, increasingly, to author both PowerShell and SQL Server scripts, bundle them into a solution and store that in TFS. Usually the PowerShell scripts are very closely related to SQL Server, and have a lot of SQL in them. I am hopeful that 2012 SSDT, or the tighter integration of SSMS and Visual Studio in 2012, might help put all of this in one place, but for now I am stuck in SSMS 2008 R2. So here are my blunt attempts to marry these activities.

    (This post is rather like the scene at the end of MythBusters, when the explosion didn’t really work, so they just put a lot of C4 under the <whatever is being used> and set it off.)

    eyebrow

    Get SSMS Connected to TFS

    First, I downloaded and installed the TFS integration bits for SSMS, known sesquipedalian-ly by the polysyllabic appellation “Visual Studio Team System 2008 Team Foundation Server Microsoft® Source Code Control Interface Provider.” Or TFS MSSCCI Provider, for short. This adds integration between SSMS and Team Foundation Server, so that I can create a SQL script solution and store it in source control, with check in/check out, etc. It’s sort of OK, actually, and miles better than nothing.

    Next I made a new SQL Server script solution in SSMS. I then right-clicked on it in the Solution Explorer and used the context menu to add it to source control.

    Make a PowerShell Script by Force

    Now for the PowerShell “integration.” In SSMS, in a solution, the “Miscellaneous” folder can contain any sort of files you like. So I right-clicked on the project, chose Add New Item. In the Add New Item dialog, I picked a template at random and forced the file name to myPowerShellScript.ps1. This makes the file, in the Miscellaneous folder (admittedly with a few lines of unwanted SQL code in it):

    NewPowerShellScript

    Create an “External Tools” entry for PowerShell ISE

    SSMS has long had the ability to launch an external program from a menu command. I hadn’t used this much in the past (never really saw the point) but in this case it works nicely. I clicked Tools > External Tools… and created a new entry for the PowerShell ISE:

    ExternalToolsDialog

    In Arguments and Initial Directory, it’s possible to put $()-style variables that are sensitive to the context in SSMS, in this case to pass the file selected in the Solution Explorer as an argument to launch the PowerShell ISE – basically allowing me to open a .ps1 file, in the right app, from inside SSMS.

    At this point, it’s possible to create a source-controlled PowerShell script in the SSMS solution, to open and edit it with the PowerShell ISE, and to check it in to TFS. Huzzah!

    Option: TFS Windows Shell Extension

    That’s pretty grand, I think, but as an added option, it can be handy to check out a PowerShell script, make a change, and then check it back in, without using SSMS. That’s possible using a little Team System widget called the Windows Shell Extension, which is bundled in this TFS Power Tools download: http://www.microsoft.com/download/en/details.aspx?id=15836.

    Note: this installer, by default, installs a bunch of stuff you don’t want, and not the thing you do, for this case, so use the Custom option in the installer and select the Windows Shell Extension feature.

    With the Shell Extension, you can see the source control state of files right in Windows Explorer, with a little green icon overlay. There is a right-click context menu to check files in and out.

    Adding this widget means I can check a file out in right in Windows, edit with PowerShell ISE, and then check it back in – great for those PoSh scripts that don’t have a lot of SQL dependencies.

    It’s not pretty, but it works. And, provided you have a legal TFS license, by my count the cost comes to … $0!

  • Microsoft Delivers Full Suite of SQL Server PowerShell Cmdlets

    We’ve all been waiting several years for this, and finally it’s here! Coinciding (approximately) with the release of SQL Server 2012, a new Feature Pack has appeared on the Microsoft web site that adds a full suite of PowerShell cmdlets for DDL and other functions. This means that, at last, we can do things like fully-featured SQL deployment scripts without all the (severe) limitations of T-SQL, such as primitive use of variables, flow control, exception handling.

    Taking a cue, finally, from the community project SQLPSX, the SQL Server team seems to have designed the new library with ease-of-use in mind, thankfully concealing a lot of the complexity of using raw SMO objects through a language like C# or PowerShell.

    Here’s a teaser showing just a few of these new offerings:

    New-Table –Schema dbo –Name Accounts –Columns $myColumns

    Remove-Table –Schema dbo –Name Accounts

    New-Index –Table $myTable –Columns $myIndexedColumns

    Set-Index –Name dbo.Accounts.idxBar –Columns $myRevisedColumns

    New-Procedure –Name Foo –ScriptBody $sqlcmd

    The list goes on! This will make deployment of changes to existing databases so much easier. No more insanely complicated T-SQL scripts with brittle Dynamic SQL and hazardous error handling to manage complicated flow control like “Create this object if it doesn’t exist already.”

    And all you admins aren’t left out in the cold either – check out the fresh cmdlets for managing SQL Agent jobs, and the incorporation of Agent into PBM!:

    New-AgentJob –Name MyJob –Type PowerShell –ScriptBody c:\scripts\foo.ps1 –Schedule $MySchedule

    Set-AgentJobSchedule –Name “Daily at 3:00 am” –StartTime “03:00” –Enable

    Set-ServerProperty –MaxDop 4 –MaxMem 8000GB

    I am SO excited, and this is amazing, but I haven’t had a lot of time to explore all the options. I hope to really dig in after reading everyone I else’s April Fool’s posts. Until then, you can download this feature pack yourself (probably) at:

    http://www.microsoft.com/download/en/details.aspx?id=88254 

    Edit April 2: Please note if you come back to this page later, this was an April Fool's joke. There are a handful of new cmdlets in SQL Server 2012, but nowhere near this level of support.

  • T-SQL Tuesday #028: Whaddya Mean, “Not Your Job?”

    TSQL2sDay150x150_thumb

    This T-SQL Tuesday, hosted by Argenis Fernandez (Blog|Twitter) is devoted to the question, “Are you a Jack-of-all-Trades? Or a specialist?” This question really hits home for me, on a number of levels.

    (Aside: I have huge respect for Argenis – he’s smart, funny, no-nonsense, very accomplished. If you don’t follow him, do.)

    If you have read any of my previous ramblings on this blog, you may know I was originally educated as an architect – the bricks and mortar kind, not the information systems kind. While I didn’t stay in that profession, I’m afraid it had a permanent influence on the way that I am wired. Architecture, especially architectural education, is grounded in a particular way of thinking about the intersection of design and engineering, which sticks with me.

    A building is a problem with an unimaginable number of variables. There are so many that, for one building program (that’s the problem statement about what the building is supposed to be for, and how big, etc.) on one site, there is probably an infinite number of really good solutions. There are variations on structure, light, materials, function, climate, culture, history, on and on. A very good architect can make a building that works, keeps the rain out, and lasts, but at the same time is also inspiring, creative, and enriches the lives of the people who use it.

    Contrary to popular belief, architects don’t do a lot of math. They don’t do the calculations for structural engineering, for example. In fact, in most places they are prohibited from doing it because they aren’t qualified. The skill that I think an insightful architect has is not engineering, but synthesis. From all the intertwined systems that make a building, structural systems, enclosure systems, mechanical and electrical systems, patterns of use, cultural influences, colors, natural light, the landscape, the architect can, through design, form a building into something that reveals and expresses those systems in a way that is not only functional but beautiful at the same time.

    In order to do that, the architect does need to genuinely understand how those systems work, their internal logic, but perhaps in a different way than an engineer understands them. The architect has to “zoom out,” to understand how each system works at a macro level, and how it relates to all the other competing systems, interests and priorities that compose the larger problem of making the building. The architect’s understanding has to be accurate, without necessarily being microscopic in detail. The micro-level answers are available when you need them, by hiring someone or by doing research -- and the world is just too large and complicated to literally know it all.

    Put another way: the hack architect either willfully stuffs a building into some predetermined form, or is ignorant or uncaring about making something, and settles for cheap, expedient corner-cutting. The skilled architect can synthesize the functions of the building, the site and the building’s construction into something elegant and beautiful, by understanding something about how each of those things work.

    So, what has all this got to do with the topic at hand? I would like, in my small way, to work in IT the way the serious architect designs buildings. It’s interesting to think about whether or not working that way is a specialty – and I don’t really have an answer. But I have found that zooming out, taking an interest in related systems, helps me tremendously as a DBA. Sometimes it’s not the micro details of other systems that matter, but spending the time “getting” the logic of other systems, languages, interfaces.

    Example: other computer languages. I am no developer, but I have great respect for the really good ones. I have tried to learn at a macro level how to code in Scheme, Java, JavaScript, VB.net, C# and recently PowerShell. I have put a lot of effort into understanding OO concepts. The same way an architect doesn’t actually calculate structures, you would not want me to code up your complex application. But as I try to formulate IT solutions, it’s been really helpful to know something about how that all works, and how what I do relates to it. I hope it will make me less likely to do the IT equivalent of jamming a square peg in a round hole, and more likely to help find the elegant solution.

    Example: Windows, Storage, Failover Clusters. To me it just seems natural to want to know more about the systems that my SQL Servers are running on. I’m not a very experienced Windows guy, and when I go and see the good ones present, I am humbled and amazed. But I have certainly tried to get some macro-level understanding of these other systems as a way to get better at my own job, and I’ve learned how to build things like failover clusters ground-up.

    That brings me back to the title of this post. “Not my job” is actually an interesting and nuanced concept. On the one hand, there are a lot of things I really don’t know about, and many I know barely enough about to be dangerous. I came late to this profession and I’m still a novice. I think it’s important to recognize that fact, in the same way the architect needs the engineer, and not create havoc imagining that I know more than I do. In short, I need other people who really know things. That’s the positive version of “not my job.”

    Of course, there’s the other interpretation, where “not my job” is really code for “I can’t be bothered,” or, “that’s not interesting.” That attitude I don’t get. There is very little in this world that is not interesting in some way. Architects tend to be interested in everything.

    I will leave you with two of my favorite buildings in the world, and a thought: these are made of the same steel, stone, wood and concrete as other buildings – what is it that makes them so amazing?

    Have a look at Louis Kahn’s iconic Salk Institute and Kimbell Art Museum

  • Unorthodox DBA Kits

    I get some strange looks from people when they see what machines I use in my DBA life. First it was because I was using a personal MacBook Pro. In the past few days, it’s because I’m not using it any longer. I still love the MBP, but so does my wife, and she, rightfully, didn’t love the idea that I was hogging it all the time. Now the MBP has returned to its quiet life at home, and I have a new Tablet with a touch UI for work. That was a huge change for me, because it’s the first PC I’ve ever bought. Yes, ever.

    Whacky Setup 1

    For two years I happily ran SQL things from a unibody MacBook Pro. There are a number of people in the SQL community who have used or switched over to the Mac with one of several Windows virtualization solutions, and the combination works great. The build quality and design of the Mac is excellent, and in my experience they are both reliable and a joy to use. Parallels, Vmware Fusion and even VirtualBox are all working, solid solutions to make Windows run on the Mac alongside OSX. Personally I’ve used the Vmware solution most, and tested with the VirtualBox solution. For work we have remote access using SSL VPN through the browser, so there’s really no need to worry about what operating system is on the client, as long as that works. From a security perspective, all work stays at work – literally inside the datacenter.

    After I had this machine for a while, my friend Aaron Bertrand, through his blog, convinced me to try Apple’s Magic Trackpad. While I was skeptical at first, that thing has completely won me over. The name is corny, but what a fantastic device.

    My Mac would drive a big external monitor, but just one – and with a mini Displayport Adapter to convert to DVI

    Kit 1

    MacBook

    13 Inch MacBook Pro, 8GB RAM, Core 2 Duo (this is a couple years old now)

    DVI Adapter

    Any Big Monitor (with a small adapter, the Mac can drive just about any external monitor – but only one). Mine is a work-supplied 24” HP LCD.

    Apple external Bluetooth keyboard and Trackpad

    Vmware Fusion for Macintosh

    Legit Windows 7 license from Microsoft Store, legit SQL Server installed for learning/test/demo purposes

    Samsung Android phone, with the Mac’s calendar and address book set to sync automatically through Google services

    Various cross-platform utilities including Dropbox, 1Password (together), Xmarks, Evernote, that hook Mac to Windows to Android

    The exchange with many of my co-workers and friends:

    Startled co-worker: “Wha? That works?”

    Me: “Yep.”

    Alas, when it came time to yield my much-loved Mac back to my family, I had to buy a new machine. In the intervening time, I have really become enamored of the touch interface on both my phone and on my kids’ Nook. But most tablets don’t have the horsepower or the OS to do everything I needed, and I really didn’t want another device to lug – having a tablet and a phone and a laptop has no appeal. 8GB of memory and either Mac OSX or Windows 7 was a requirement. It was a random tweet from Buck Woody that turned me. Solution? Lenovo x220 ThinkPad Tablet.

    Whacky Setup 2

    The “convertible tablet” design goes back a ways – when I was designing buildings in the early-mid 2000’s, the firm I worked for did a collaboration with Microsoft to help with the early implementation of XP’s tablet features on early convertible tablets. They had the same basic design as my new Lenovo. But back then it sucked, and there was no market.

    Fast forward to 2012 – now that touch interfaces have matured, tablets have become popular and useful, I think that the this time around this idea may work. This machine is a really fast laptop in its own right, and the touch interface is working for me as a tablet substitute. It seems strange, perhaps already antiquated, but I love the combination of touch screen, stylus, power of a full-on laptop, and good keyboard when I want it. It’s fantastic.

    Kit 2

    X220-tablet-1L

    x220 Convertible Tablet with i7, 8GB RAM, 160GB SSD

    Display Port to the same 24 inch LCD monitor

    Apple (yes!) bluetooth keyboard and trackpad (the Apple trackpad is far, far better than the Lenovo one)

    A ridiculous number of pointing options, including ten fingers, the stylus, the ThinkPad eraser thingy, the built in trackpad and the external one

    VirtualBox for experimenting with SQL Servers on a virtual network

    Same Android phone and features, but without the elegant Google sync (sad face)

    Same toolbox of syncing apps: Dropbox, 1Password (together), Xmarks, Evernote, etc.

    Same conversation with my co-workers.

    I just have a week and a half in with my new Frankenstein of a workstation, but it’s working for me. Next installment here – a comparative review of the two setups, and some tips about the flakier stuff, like how to make the Apple accessories and the ThinkPad talk, and what to expect if you are considering the switch to Mac / OSX as a SQL Server DBA.

  • Leaking Money?

    Does Your Group Spend Too Much or Too Little?

    I haven’t been blogging much this winter, as I’ve been buried under a fairly complicated an ugly upgrade project, which, I’m happy to say, is behind me now. There was one notion that kept coming back to me during that project that I thought I’d share. It seems obvious, but it’s an idea that it seems many IT groups have difficulty discussing or making intentional, informed decisions about: that is the idea of “good enough.” If an IT system is a key part of your business, it’s important to realize, and then make purchasing and staffing decisions on, an accurate notion of what level of quality you really need. Consider this (admittedly crude) diagram:

    Cost leakage 1

    To the left on the bottom axis we have something approaching the perfectly engineered system: no flaws, perfect redundancy, perfect uptime, infinitely expensive. Near here is where I would expect NASA to want to be for manned spaceflight. Perhaps the engineers of an atomic clock, or, sadly, an atomic bomb. This land is spendy, and hopefully the systems created here are near perfect.

    To the right we have junk-ola. Systems that need a huge amount of hand-holding by IT staff, that are prone to break down in both routine and new, unexpected ways. These are the systems that page people all the time (or, worse, don’t page people and should.) These certainly were cheap to put together. Someone probably thought they would save money.

    On the y-axis we have cost, higher at the top. The curve represents the planned, expected and visible cost of a project or a piece of IT infrastructure. Approaching perfection gets to be expensive, and the closer you get to perfect the faster the cost rises. This is why, for example, going from zero-nines to two nines in uptime is expensive, but going from that to three and then to four forces cost up dramatically. As we go to the right, we start to buy or build junk, and things look cheaper. Obviously clever people can bend this curve to get better systems for less, which is great, but as a rule this pattern holds true. Fairly obvious, I think.

    The Hidden Cost of Junk

    Here’s the detail that I find true in many shops, which is much harder for people to talk about:

    Cost leakage 2

    As you move to the left on the graph, attaining high quality generally means things like planning, testing, development, schedules, manpower, fixes, careful deployment – all of it planned into the project with a cost assigned. The cost goes into budgets and estimates and gets scrutiny and conversation. In some places that also means it gets canceled, trimmed, cut back, edited, sent back to “sharpen the pencil.” Such is the nature of “visible cost.”

    As we move to that left boundary, the idea that things will entail a lot of operational handholding and unexpected expense goes down. NASA, for example, would not launch the shuttle (back when NASA did that) and then, once in orbit, have to run out and unexpectedly hire a bunch of expensive consultants to see if they can figure out how to land it. One assumes they had worked that out ahead of time, and spent planned dollars to do so. The dashed line for hidden/unexpected cost should go way down as we build highly-engineered systems.

    Over on the right, though, is a familiar place for a lot of businesses: a company buys some system, service or software on the cheap, hopeful about the original price tag, and then finds themselves trapped in a vortex of unexpected costs: to fix it, to keep it running, to manually execute tedious monkey work required due to missing features. In short, they spend a huge number of either man-hours, opportunity cost while the system is down, customer good will or consulting dollars, trying to repair something with an artificially low up-front cost. Most of the time there is also a strange psychology in play where they keep hoping against hope that they will still save money and be able to stop doing that. Often the idea of changing out the system seems prohibitively expensive and difficult.

    So, where should we be? In the happy place, if possible:

    Cost leakage 3

    The happy place is where we didn’t spend too much for extravagant, gold-plated systems that overreach what the company really needed – like a five-nines, always on system for workers that use it from 8 to 5. The happy place also is where we don’t spend thousands of hours (time is money) trying to force some piece of junk to keep running, where it isn’t up to the task, and in the process leak dollars day and night, paying people to do non-productive work that doesn’t move the organization forward.

    The happy place is in a different spot based on your business. You might really need to have 24x7 always-on redundant systems. Or perhaps not. But it’s key that the leadership in your team:

    1. Knows about this graph
    2. Does not have some strange fantasy about where you really are on this graph
    3. Has the right strategic vision to put you in the right place on this graph

    I am very lucky to have a great job in a place that I think understands this – though every team, I think, has disagreements or times when this is hard to articulate.

    Entropy Pushes to the Right

    Another powerful sense I had in the last two months is that in IT, when you are an FTE inside a business, systems naturally, relentlessly, slide to the right on this graph. As a DBA or a system administrator, I find that one of the most important strategic roles I can play is always to push things to the left, just to counteract their natural slide to the right.

    Here’s what happens: company buys a system that is sort of OK, and kind of meets their needs. But Jim in department X finds this little piece unworkable, and customizes, and Sue in department Y would like a little adjustment over here, then Dave in Ops makes an ad-hoc job for Sheila the VP, and on and on. The intern in HR learns a little Access Reporting Services and makes a bunch of “useful” widgets. Each small band-aid looks great in isolation, but over time they accrete into this Rube Goldberg machinery around the system. This is practically an irresistible force.

    So, what to do? Those people need those changes to be productive. It can really help their jobs and mental state to work around flaws with the software (which are there, really.) The best I can come up with is to try to always make the little tactical moves add up to a strategy that counteracts the inevitable right-ward slide to Junk-dom.

    1. Always attempt to apply best practices, even in apparently small situations.
    2. Document.
    3. Realize that everything we make is probably permanent. People very rarely go back and fix things, really. If something really does end up temporary, then rejoice in the exception.
    4. Try to make the tactical moves fit into some larger architecture that has some staying power, and won’t just become a pain to maintain later on.

    How about you? When budgeting decisions look baffling, or you can’t get approval to purchase something you need, is it really because your team is moving across this chart? Going the right direction or the wrong direction?

  • SQL Saturday 108 Redmond PowerShell Session Material

    I am honored to be presenting my first SQL Saturday session in Redmond on the 25th of this month. The session will be a PowerShell basics class, emphasis on helping people who might be starting out with PowerShell, or feel intimidated by PowerShell’s syntax or object-orientation. I have demos and material focused on PowerShell syntax in general, for any task, and then a few demos with a SQL Server slant.

    The main idea is to get an understanding of how and why PowerShell syntax works, which should make it easier to understand script samples one might find on the web, or get started and get productive quickly.

    If you’d like a preview, or can’t make it to SQL Saturday Redmond, there’s a copy of the presentation and demos available for download on the SQL Saturday site: http://sqlsaturday.com/108/schedule.aspx > “PowerShell 101.”

  • One Database or Ten?

    In my career to date I’ve worked as a DBA for mainly “buy don’t build” scenarios. One advantage - and hair-graying problem, if I am honest - has been that I think I’ve seen upward of one or two hundred different ISV-provided applications’ SQL Server databases. It’s a great learning opportunity, as I can see successes and failures in a large variety of designs, and sort of “fast-forward” to version five or eight of a given design pattern to see things like whether it scales, or has storage problems, has security flaws or DR issues. Today I want to talk about an often repeated blunder that I’ve seen the whole time: designing around the wrong number of databases in SQL Server.

    Implementing an application based on the wrong number of databases, architecturally, can have major consequences. As you can imagine, that type of design issue is very, very hard to untangle in version 2 or 5, after the application is out in the wild. So to me it makes sense to consider very carefully, and get this right if you are lucky enough to be at the beginning of the process, or at a point where a redesign is possible.

    The Most Important Consideration: The Log(s)

    The transaction log doesn’t seem to get a lot of love in the development process – it’s just that workhorse file that becomes a nuisance when it grows to fill its disk. But here’s the thing: whenever anything goes wrong in SQL Server, from a simple failed row insert to a server crash or DR event, the log is the key thing that saves your bacon. It makes rollback possible, and recovery of the data to a consistent state possible. It makes atomic transactions possible. It makes online full backup possible, and point-in-time recovery. I’d like to repeat that: the log is a vital element in taking a full backup while the database is online. We’ll see why that is so important in a minute.

    The most important idea about the log is this: the only way to get a transactionally consistent set of data over time is to have either just one stream of journal (log records) for the data, or to have multiple streams of journal that are robustly tied together, lock step. Because each database in SQL Server has its own, independent log file, the first situation is simple to implement – just use one database – while the second is very, very difficult. Splitting data into more than one database implies multiple log files, which in turn implies an independent log sequence. Having separate transaction log sequences means that, while it’s initially possible to perform a transaction across the two databases consistently and correctly, there’s very little guarantee after that transaction, in the space of backup and restore or disaster recovery, especially out in the real world.

    The converse is also true – there’s obviously a penalty for co-mingling unrelated information together in one database with a single stream of log records. Centralizing all activity, unnecessarily, in a single log file carries with it overhead, reduced concurrency and other problems. So how can one make this design decision? It turns out there are some fairly simple criteria one can apply, and, when it comes right down to it, only two scenarios.

    Two Possible Mistakes

    Fortunately there are just two mistakes that we can make:

    • Separating data that needs to be transactionally consistent into two databases
    • Combining data that has no need to be transactionally consistent into one database

    Here are some questions to ask about a design to see whether it is helpful or dangerous to split it into multiple databases. First, if you are considering multiple databases, are there any places in the design where the databases must be “lock-step,” time-wise. Example: are there any tables in one database that refer by joins and foreign keys to another database, or to a sequence or identity generated in another database? If so, you may be open to profound risk.

    To test this, do the following experiment – even as a thought experiment. Imagine that the system is running full-tilt, the application adding data to both databases. Halt the system. Restore one database from backup to one point in time, let’s say five minutes before the time of the halt. Next, restore the other database to a point in time TEN minutes before the halt. If the design implies that that five minute gap in time will cause problems with your data or application, then you are open to severe risk by storing this data under two separate log sequences, which is implied by using two databases.

    Second, if your application uses three-part-name, cross-database queries, what are they for? Do they imply that the data across databases really should be part of the same set, or are they legitimately separate? Is there some give and take, in time sequence, between the data in one database and data in the other(s)? I’ve had ISV’s say things like “failover at the instance level is a requirement.” That would be OK, if it were possible. It’s not, not really. “But we would just restore both databases at/to the same time,” is something I’ve heard as a remedy. That isn’t anywhere near good enough, and here are some technical examples showing why:

    Backups are Broken

    1. Full backups are not and cannot be time-consistent across multiple databases. That’s because, in order for full backup to work while a database is online, it incorporates some “slice” of the log file, which is replayed when the database is restored from the backup file, in order to make the data consistent in the database relative to changes that were made during the time the full backup was actually taken. The restore will always be as of a point in time at the end of the time it took to execute the full backup.
    2. It’s impossible to run full backups of two databases and make them end at the same time. Imagine: one day database 1 takes 5 minutes to back up and database 2 takes 25 minutes and 5 seconds to back up. Each day that time varies. Starting those at the same time obviously makes no difference. They would have to be made to end at the same time, every time, which is impossible.
    3. Implication: it’s not ever possible to restore multiple databases from full backups to the same point in time.
    4. One might say, then, that it’s just always a requirement that log backups be used, and any restore manually rolled forward to precisely the same time point. I think that would be fine, for the five percent of organizations who could actually make that happen. Few people actually do that out in the real world. Simple restore from full backup files is basically always an expectation, and while it’s technically possible to use log backup files, there’s real risk in the fact that either people won’t expect to have to do that, or they won’t know how to do that. Why even introduce that risk?

    DR is Broken

    1. Suppose your customer (or your organization) has implemented log shipping or off-site async database mirroring as a disaster recovery solution.
    2. Crunch: disaster. The app just crashed, the database server is down, and we need to fail over.
    3. At the DR site, we have some log backup files, or streams of database mirroring log records – different ones for each database. For one database, the log sequence goes up to 5 minutes before the crash. For another database, the last log backup was much bigger, and it didn’t copy over the WAN in time, so that one is 15 minutes behind. Or that database was behind in sending mirror log records due to some large transaction.
    4. The DBA, as would be reasonable to expect, applies whatever log records are available and brings the DR site copies of the data online, and, because it’s a crisis and she’s busy, she moves on to the 300 other databases under her area of responsibility. The databases are now 10 minutes apart in time. How’s the application look?

    The takeaway here: in order for HA/DR, crash recovery and backup features built in to SQL Server to work simply and reliably, your database design must take into account how the log sequence preserves data integrity and consistency in the data, which is easy in the context of one database, and extremely difficult across multiple databases.

    Performance and Common Sense Partitioning

    So, what about the other side of this issue? There are, of course, performance and administrative advantages to partitioning data out where it’s safe to do so. One basic principle of scalability is to avoid global resources (think one huge table) in favor of partitioned resources (several smaller tables). Here are some cases where it can be safe, and advantageous, to separate sets of data into distinct databases:

    1. True “reference” information that is loosely coupled to the primary database. I have seen this successfully implemented for “lookup data” that may be aggregated together from other areas of an organization, such as other regions or other organizational functions, that is not coupled to an application database in real time. I have also seen it used effectively for small data marts, data warehouses or reporting databases, where reporting information is copied out of a primary database, asynchronously. If such a database is “behind” the main application database, it doesn’t really matter, and whatever interface exists between the two is elastic enough to accommodate that.
    2. Audit or application error log information. There are applications where audit or application log data is produced in abundance, i.e. crazy huge tables. It may be safe, if business policies permit, to store this data in its own database with a separate log, enabling different and better administrative and retention policies. Simple example: let’s re-index the transactional database, but maybe let’s not tie up the main application’s database log by re-indexing tens of gigs of log or audit data, where fragmentation of that data isn’t really a concern. Secondarily, perhaps the transaction context for real activity in the main database can be different than the transaction context of audit or error logging so, for example, logging can’t block “real” transactions.
    3. Multiple ‘tenants’ using the same application. I have some experience (not a huge amount) with multi-tenant scenarios, and in every case I have seen, it’s been destructive to combine multiple tenants literally into a single database, and much more successful to host the tenants side-by-side, each in their own database. There are some administrative challenges to managing a whole collection of databases, but that can be solved with semi-clever code, tools, and rigor around implementation. The scalability problem of one massive DB is much worse. There are several reasons for this:
      1. The natural partitioning that results from splitting the DBs out (one tenant cares only about her own transactions) seems always to provide better query performance and scalability than co-mingling data from different audiences into the same tables. There is some penalty in procedure cache, because each different database will have distinct query plans, but having the data partitioned, on balance, tends to make the queries simpler and faster. Table and index scans, when required, are automatically restricted to a small set of data that matches the audience, rather than scanning a huge structure and discarding much of the data because it’s not relevant to the client who is performing a query.
      2. Locking and associated concurrency issues can be much simpler, and the damage contained. One tenant locks a table? No problem outside their own world, if they have their own database, and the lock is likely to be much shorter in duration.
      3. Tenants frequently want to take data with them when they come and go, and there’s a major advantage in terms of portability and/or archiving if the database for that tenant is physically independent.
      4. Scale-out to multiple commodity servers clearly is simpler.
      5. Backup and database maintenance across many small databases can be more asynchronous and less impactful than the same for one massive database.
      6. The log sequence(s) for each database can be safely and legitimately independent. This makes them smaller, easier to manage, and helps the HA/DR situation.

    Conclusion

    If you have the chance to make this choice, if it’s not already too late, choose wisely. Consider that disasters do happen, and be certain to choose a single or multiple database design for the right reasons, and in a way that doesn’t introduce risk for your application.

  • T-SQL Tuesday 24: Ode to Composable Code

    TSQL2sDay150x150I love the T-SQL Tuesday tradition, started by Adam Machanic and hosted this month by Brad Shulz. I am a little pressed for time this month, so today’s post is a short ode to how I love saving time with Composable Code in SQL. Composability is one of the very best features of SQL, but sometimes gets picked on due to both real and imaginary performance worries. I like to pick composable solutions when I can, while keeping the perf issues in mind, because they are just so handy and eliminate so much hassle and complexity.

    What is composability? Here’s a quick example. Stored procedure calls do not compose. That fact creates pretty ugly workarounds like creating temp tables, guessing at their schema requirements, and using insert … exec to get back results from a stored proc -- just to work with their results in any meaningful way:

    SELECT col1, col2, col2
    FROM dbo.mytable t
    INNER JOIN dbo.myview v ON t.keycol = v.fkeycol
    INNER JOIN EXEC dbo.storedProcThatJustReturnsRows p 
        ON oops.thatWill = NEVER.work –- #FAIL

    This is because the results returned from a stored procedure are not really defined in terms of shape: the server cannot tell what to expect pack from a stored procedure, if anything, and therefore cannot allow that result to be composed in the context of an enclosing SQL Statement. That limitation is why I listed insert-exec among my Top 10 T-SQL Code Smells.

    On the other hand, structures that do have a well-defined “result shape,” including views and table-valued functions, are composable:

    SELECT col1, col2, col2
    FROM dbo.mytable t
    INNER JOIN dbo.myview v ON t.keycol = v.fkeycol
    INNER JOIN dbo.myTVF( @someParameter ) ON ...

    This provides a huge amount of flexibility when writing code.

    The very best composable behavior is when the code also allows the SQL Server query optimizer to unpack and optimize the content of the composed structures in the larger context of your query, which is true in most cases for views and inline table-valued functions. There are some specific limitations that can prevent this type of expansion/optimization, and, of course, it’s important to be aware of those issues from a performance point of view. But in general, if you can allow the optimizer greater flexibility in choosing a query plan, it at least has a chance at improving performance.

    Example: suppose I have a stored procedure that returns 10 columns and 100 rows, and I only really need the values from 2 columns in 25 rows. If I have to execute the proc, output all the results to a temp table, then work on them, there’s a decent chance the optimizer could factor out quite a lot of the work involved if it were given more options. Encapsulating the procedure code so fully as to force the optimizer into a specific execution sequence can actually limit the ways the query could be executed, and thereby “defeat” one of the server’s primary and most valuable features.

    I don’t mean to argue that stored procs don’t have a place, because they do. I also would not argue that this type of inline optimization always works, because it doesn’t. But in the main I prefer to take advantage of composability when I can, only optimize back from that where there’s a legitimate reason to do so, and try in those cases to understand why optimization isn’t working.

    (And three cheers to the SQL Server team for following this philosophy in moving to dynamic management views and functions and away from diagnostic stored procs!)

  • PASS Summit: What’s the Shape of the Future?

    Gushing

    First I have to gush a little about the PASS Summit. Skip ahead if you like - this type of stuff doesn’t make for the best reading, and I won’t be offended. The Summit was amazing as ever this year. The thing I love about it, in addition to just the wonderful, supportive atmosphere, is that most of the people involved are real professionals doing real work, on the ground, with the Microsoft SQL Server stack. There’s some marketing spin, of course, but it’s tempered. You’re as likely to hear someone say “that doesn’t work” or “that’s still vaporware” as “this is the marketecture/kool-aid/pitch/roadmap for all things SQL Server.” I loved the content and the chance to meet old friends and new from the Intertubez.

    I generally use the Summit as a chance to pick my head up out of the day-to-day details and pressures of work, and try to look ahead. I went to an advanced driving course here in Seattle at Pacific Raceways a few weeks ago (thank you Kimberly for giving me a push to do that!) and one of the primary exercises they drilled into us students was to look ahead. Look ahead, through the curve, anticipate what’s coming, because at 120 it’ll be here in a second and you’d better be ready.

    The second concept was that you drive a car fast with smooth, fluid motions. You don’t jerk the wheel or jam the brakes; twitchy is fundamentally bad. The way to be fast has everything to do with planning ahead, choosing a line, then executing with the minimum inputs. Don Kitch, the lead instructor said, “lots of people want to see me drive their fancy cars on the track: the key to that is, watch what I don’t do with the car.” What he meant was, it shouldn’t be flashy or exciting to watch him toss the car around. That would be missing the point. I think the same is true, if perhaps at a slower pace, for a DBA. Anticipate what’s coming, steer your organization and your systems with a minimum of drama. Do it quickly. Make it look easy.

    So, what’s coming up around the next turn? I think the last couple of years of PASS Summit give us some clues.

    You’re Gonna Need Peeps

    The first is that all of us are going to need to lean on each other in the SQL Server community. The pace of change and the growth of the SQL Server family of products is astounding. More and more we’ll all be under pressure at work to know, or at least know about, the whole stack, and that will become less and less possible. That leads me to what I call “Peep Consulting:” lately, through social media, online contacts and training opportunities I’ve been able to come out of my shell a bit and tap into the community, to real benefit for my work and for my company. Peep Consulting is a way that a bunch of us in the community can collaborate, informally, to make each other look good, and collectively stay ahead of the massive explosion in all the SQL Server products. It’s not about long engagements or big projects, necessarily, it’s more the #sqlhelp tip, or the hallway conversation, or the email list that can steer you in the right direction, from people with more experience in some specific area of the products. People who aren’t doing this type of networking will be at a great disadvantage in the coming years, I think, as the development of the products accelerates.

    Things will get Simpler, More Complicated, Slower and Faster

    This notion is a bit hard to express, but the trend I see looks like this: the environments we work in, especially where I am in the small/medium business arena, are going to become ever more sprawly. The multiplication of apps and VM’s (Windows is practically the new App, as people spin up a new VM for just about every single service or application) will cause a multiplication of virtual servers, instances and databases. At the same time, the tools for administering that sprawl should improve, so it won’t be quite as daunting as it was two or three years ago. I think we’ll find we are faced with a much more complicated, more heterogeneous environment, with more different products, from the relational engine to SharePoint/PowerPivot and everything else. But we’ll also gain tools to grapple with that more effectively. Sadly, the idea that we’ll ever have a single data source for a small business – one database to store the company’s data in a consistent schema -- seems truly dead. It would be useful, but it doesn’t seem possible. In fact we are headed the other direction, rapidly.

    Second, I think the hybrid of cloud services like Azure, infrastructure as a service, and so on, will certainly change the landscape of small and medium business. In a few years a small company that has it’s own servers in it’s own data center or server closet may be a real anachronism. Smart companies will get out of the business of buying servers and electricity, diesel fuel and air conditioning, and will leave that to larger commercial data center operations. Ironically, though, that will increase the trend above, where things get even more complicated to plan and operate. The types of decisions will be different (more about software architecture and less about wiring) but no less complex. Instead of the care and feeding of a cold room full of machinery, we will face the care and feeding of an unbelievably complicated collection of cloud services and data, with real integration challenges.

    On the slow and fast front, we should see a continuation of what has happened to this point: software gets ever slower and more complicated, and hardware gets faster just fast enough that nobody cares much. We will continue to burden our servers with layer on layer of additional complexity – I’m looking at you, VMs – because it’s expedient, and because there’s no economic advantage to combatting that trend by making more efficient software. It’s always more economical, it seems, to work around a complex problem with a new layer, than to solve the problem. Allan Hirt had 12 VM’s running simultaneously, with simulated shared storage, on a laptop in his precon. Q.E.D.

    Bits vs. Strategy

    The combination of those trends I think means that as a work-a-day DBA, on the ground, things will pivot in the next couple of years. Classic problems like backup strategy, DR, indexing strategy, query performance and setup of complicated machinery like failover clusters will probably become commoditized. The software is just making that stuff easier with each release, and eventually it’s not going to be such a specialty skill. There will still be the need for some high-end tuning of large, high performance systems, but the bulk of stuff out there should just work without quite so much effort. When a $10k server has a TB of RAM and PCI flash storage on board, and DMV’s mature to render visible all the cryptic performance issues of the past, it just gets easier to run a small system.

    On the other hand, the strategic decisions about where to place a company’s data, how to navigate the complexity of a blended system of on-premises data and cloud services or IaaS, which of the multiplying platforms make sense – from Hadoop to SQL Server to DAX or SSAS or PowerPivot – to keep a company agile, these decisions will become the things that are most valuable in small or medium businesses. All businesses will want to move in an agile way as they or their customers demand taking advantage of the multiplying new services out there. Most businesses I imagine will also want out of the datacenter management field as quickly as is practical, and will not want to own hundreds or thousands of servers any longer. More capability, less equipment.

    This is where the SMB IT staff or DBA will differentiate him or herself – I think if we just keep our heads down, tuning query plans, that we’ll miss what’s about to happen. If, instead, we can look strategically ahead, and guide organizations into opportunities for increase capability and lower cost, we’ll stay relevant, and also have more fun.

    But then, who really can predict the future?

  • SCOM, 90 days in, IV. Fixing that SQL Agent Job thing. Yeah, that.

    OK, we’re way over 90 days in with our SCOM implementation, but I picked a title theme, and now we’re kind of stuck with the title. In any case, today is a short and to-the-point post about how to get Agent jobs alerting on failure through Operations Manager. Others, including Thomas LaRock, have covered much of this before so I will try not beat a dead horse here.

    Out of the box, the SQL Server Management Pack doesn’t have SQL Agent job discovery or alerts turned on. And every DBA I know wants to know when SQL Agent jobs fail, unless they work in such a catastrophically awful environment that jobs are made to be ignored, and fail either all the time, or by design. Personally, I want to know about every failure, and then maybe exclude some specific problem jobs or servers using overrides.

    So here’s the skinny, to get to alerting on individual job failures:

    1. You must override the discovery of SQL Agent jobs so that they are picked up by SCOM at all, when you first configure the management pack. See http://skaraaslan.blogspot.com/2011/08/how-to-monitor-sql-agent-jobs-using-sql.html or Google for how to do that. If not, individual jobs are ignored, and only the state of the SQL Agent service itself is discovered and reported.
    2. Once the jobs are discovered, it’s tempting to look at the Last Run Status monitor as your savior for catching failed jobs. The problem is, that thing stinks in real life. We tried it in a large environment, and it works only about half the time in real-world scenarios. It makes a kind of half-hearted check every so often, and seems oblivious to details like jobs being enabled or disabled. Not granular enough.
    3. You’ll want to enable the rule, which is disabled by default, SQL 200x Agent > An SQL Job Failed to Complete Successfully. This will ring a bell on each and every job failure, which is what I want. It might be very noisy at first, so be careful when you switch it on, but you will hear about each and every failed job outcome.

    Lastly, it’s helpful to set all your jobs to write to the event log on failure. I know what you may be thinking, “How am I going to go turn that setting on in all the jobs I have across the enterprise?” Answer: multi-instance query and some dynamic SQL. The basic query we want to run is fairly simple:

    EXEC msdb.dbo.sp_update_job 
        @job_name=N'Some Job', 
        @notify_level_eventlog=2

    How do I know where to find that code? I cheat: I open the associated dialog box in SSMS, against a test server, make the change for one object, and script out the change instead of applying it. I then hit BOL Search with the code from the resulting script, to be sure I know how the script works and understand what’s going on. The quality of SSMS-generated code is uneven, so it’s very important to understand how things actually work, but this is a simple starting point.

    Next, we need a script that will run that stored proc for all SQL Agent jobs. A simple way to do that is just to select from a system table or view that has all the required objects, and use the result to compose a series of statements, one for each object. If this gives us the list of Agent job names:

    SELECT [name]
    FROM msdb.dbo.sysjobs; 

    Then this combination will compose a series of sp calls and execute them:

    DECLARE @sqlcmd nvarchar(MAX);
    SET @sqlcmd = '';
    
    SELECT @sqlcmd = @sqlcmd 
          + 'EXEC msdb.dbo.sp_update_job @job_name=N''' 
          + REPLACE([name], '''', '''''') 
          + ''', @notify_level_eventlog=2; ' 
          + CHAR(13)
    FROM msdb.dbo.sysjobs;
    
    -- BE VERY CAREFUL WITH THIS:
    
    EXEC( @sqlcmd );

    Note the crazy number of escaped single quotation marks – common in Dynamic SQL, some method often is required to handle quotation marks that might be in the names of objects being manipulated. The code is ugly, and would not be happy production code for a real application, but for a basic administrative task it’s quite serviceable.

    With this combination, you should have events in the log for each job failure, and a working rule in SCOM to collect that information and alert as appropriate.

    Happy monitoring!

  • How to Run a Series of T-SQL Scripts in a Specific Order

    Another post in the handy-but-not-bleeding-edge category.

    In the past few months I’ve seen a number of folks struggle with how to reliably/repeatedly execute a heap of T-SQL Script files, in order. One could certainly argue about why there’s the need for piles of scripts in text files, but that’s outside the scope of this post – today I want to focus on how to “get 'er done,” and save that philosophical discussion for another time.

    Problem: Heap o’ Script Files

    You receive a folder full of scripts from (who else) your ISV, together with a list of the files in the order they should be executed. This is obviously fraught with opportunities for error if one were to manually execute them each, one at a time.

    Workaround: SQLCMD and :r

    Thankfully, there’s a very easy way to encapsulate this with a SQLCMD “master” script to call all the individual files, predictably, in the correct order. The :r construct in SQLCMD is a file include, which means you can easily make a script that calls another text file, or many other text files, from disk.

    First, copy your ordered file list and paste it into a query window. Be careful to avoid whitespace before and after each line of text:

    somefile.sql
    another file.sql
    thirdfile.sql

    Set the query to SQLCMD mode.

    Next, use a regex find/replace to

    • Prepend each line with the text :r "yourFilePath\
    • End each line with a double-quotation mark, "
    • Add a GO batch terminator between each line

    That is, find any line containing some text, using this regex:

    ^{.+}

    Then Replace All using something like:

    :r "c:\\some Folder\\\1"\nGO

    The \1 within the replace expression will be populated with the original text from each line, which will, in effect, surround the original text with the quotation marks, file path, and GO. The extra backslashes are required to escape special characters including newline and backslash itself.

    The resulting script should look like this:

    :r "c:\some Folder\somefile.sql"
    GO
    :r "c:\some Folder\another file.sql"
    GO
    :r "c:\some Folder\thirdfile.sql"
    GO

    Be careful with subtleties like whitespace around the file names – regex is tough to read and get correct, so it’s easy to make an error. It might take a couple of tries.

    At the top of the script, again, you may also want this line:

    :ON Error EXIT
    

    This will cause the client to abort on any error from the scripts, instead of continuing.

    Check, triple check, quadruple check your work, and you should have a meta-script that will call all the SQL files in order.

    Cheers, and happy scripting!

  • Dear SQL Tools Team(s): Stop Starting Over. Seriously.

    I have two little boys at home, and my parents were both teachers. It gives me a strange relationship with our public schools – I am passionate about them, and the quality of education I want for my kids, but I have to keep my distance a little, lest they drive me completely around the bend. One reason it’s so frustrating: among all the complex challenges our schools face, administrators and school reformers alike are stuck in an infinite loop of starting over again before anything is finished. Before one curriculum is complete, someone wants a new one. A new educational idea arrives on the scene before the last one even gets a chance to be evaluated. We have to break eggs to make an omelet, but all we do is the egg-breaking part. And our school system is the very apex of the idea, “too many cooks in the kitchen.”

    Sadly, the people working on all the flavors of SQL Server management tools seem to me to be stuck in the same type of self-destructive loop. For now six or eight years, depending how you count, the management tools for SQL Server have gone sideways, or around in a circle, instead of forward.

    Five Versions 2.0 != One Version 10.0

    This post was prompted by a short exchange I had on Twitter this morning. Someone asked a simple question: Why is there, again, another management & dev tool for SQL Server. My off-the-cuff response was that the team at Microsoft keeps on starting over again instead of moving forward. But the more I thought about it, the more I realized that that is literally true: through Management Studio, BIDS, Team System/Data Dude and now Juneau, and even to some degree Crescent, the various tools teams continue to slide sideways, and deliver sort of disjoint, bland, V 2.0 mediocrity. We need version 10 at this point, not version 1 or 2 again. Really.

    This isn’t an attack on any of the developers working hard on those tools and features. Some of the specific features and capabilities are really interesting. The problem is with the process – just as in some fundamental public school issues. In the current vernacular:

    (Smart People + Good Intentions) * Deeply Flawed Coordination = #EPICFAIL

    There are a lot of good ideas. There are a lot of challenges. There are a lot of conflicting requirements and conflicting opinions about what a tool like this should be. That’s what design is for. If the problem were simple, design would not be necessary. This is just a design problem.*

    80% of Good Design is Synthesizing Conflicting Requirements

    So it’s a hard problem. Administrators want administration tools, monitoring, troubleshooting. Developers want development tools, code writing environments. BI people maybe want graphical programming tools like BIDS SSIS projects. We all want the tools to be elegant and simple to use. I get that. But that doesn’t mean you’ll solve the problem by ten years of starting over again, annually, creating different flavors of the same tools. It gets solved by leadership, visionary design, and synthesis. It is possible to make one environment that meets all these needs. But it takes conviction, courage, time and, most of all, design.

    What do we have now, six years or so into the SSMS “era?”

    • Still total inconsistency in the way SSMS treats basic features like scripting objects.
    • One passable query writing environment in the SSMS query window itself, which somehow(?) has not made it into any other place in the whole suite of products. There is no even C- level query editing capability in any MS BI tool, for example. It’s not even in the Table Designer. But the code’s been sitting there in SSMS all this time.
    • B- Intellisense, with better provided by a third party.
    • No code formatting, again with a fine third party solution.
    • The object tree in SSMS has had the same cruel joke of a design, that most people seem to either barely tolerate or hate, this whole time.
    • Activity Monitor. Ugh.
    • A query plan viewer that is so stale, so old fashioned, that a third-party company (and thank you SQL Sentry) could walk in and create a free one that is many times better.
    • Decent diagnostic data from the engine (DMV’s) with NO decent UI associated.

    The reasonable thing would have been to put someone with some design sense on this problem to improve the tool. Version 3, version 4, version 5. Forward progress.

    Instead, people complained from outside MS, and people within perhaps had some new ideas, but instead of having the courage to synthesize those challenges into a better product, we keep going in a circle. So I am begging: would someone please stand up over there in Redmond and say, “enough.” I want SSMS 3.0.

    * This is the thing Steve Jobs, love him or hate him, “gets.”

  • Handy Trick: Move Rows in One Statement

    Today I am posting a wee 200-level trick, taken from some work I am doing with archiving. Here’s the scenario: I have seen a few applications that have performance problems where “active” records in the database are comingled with “inactive” records. For the purposes of this post, imagine a company has a database with customer data, where the customer records are all stored in one large table. Suppose that only a small percentage of the records relate to active customers, and the rest are inactive customers – but the company wants to retain those inactive records. In daily use, it seems reasonable that the active customer records would be accessed frequently, and would want to be in cache, while the inactive records could, in theory, sit out there on disk and only be accessed rarely.

    This presents two problems: first, if the active and inactive rows are mixed up together in the same table, and therefore in the same clustered and non-clustered indexes, then in the buffer cache at any given time, you will have some mix of the two. That wastes some memory, and some resources used to access the data, because with each operation the server has to pick out the active customer records from the inactive ones. Obviously the engine is geared to do that, but there is probably an advantage to horizontally partitioning this data, physically separating the inactive customers out, and concentrating the smaller set of active customers in a nice, compact table, more of which could sit in memory.

    Second, in many implementations I have seen (and this came up yesterday in my own work) there’s just an Active/Inactive “flag” in the large table, marking each customer record. While not incorrect, there are some performance implications of that design – mainly that a two-value column like this, when incorporated into a table of perhaps millions of rows, will not have cardinality that is useful to index. Generally if you issue a query “where active = 1,” unless there has been some very clever index design, you are likely to be scanning the table, picking out the active “needles” from the inactive “haystack.” There are exceptions, but this is a very common pattern.

    So, please take my rather contrived example, and let’s look at the trick I’m really looking to demonstrate here: moving rows from one table to another, transactionally. Assuming horizontal partitioning, it’s certain that we need a procedure to “move” rows from one table to the other. But in T-SQL there isn’t really a “move rows” statement.

    Here’s the setup: in a test database, make two identical tables:

    CREATE TABLE dbo.CustomersActive (
         CustomerID int NOT NULL
                        PRIMARY KEY CLUSTERED,
         FirstName varchar(50) NOT NULL,
         LastName varchar(50) NOT NULL
        )
    GO
     
    CREATE TABLE dbo.CustomersInactive (
         CustomerID int NOT NULL
                        PRIMARY KEY CLUSTERED,
         FirstName varchar(50) NOT NULL,
         LastName varchar(50) NOT NULL
        )
    GO

    Let’s imagine those are, in fact, large, complex and realistic tables. (Work with me here :-)

    Next we populate with some sample data:

    INSERT  dbo.CustomersActive
            ( CustomerID, FirstName, LastName )
    VALUES  ( 1000, 'Road', 'Runner' )
             
    INSERT  dbo.CustomersActive
            ( CustomerID, FirstName, LastName )
    VALUES  ( 1001, 'Quincy', 'MaGoo' )
     
    INSERT  dbo.CustomersActive
            ( CustomerID, FirstName, LastName )
    VALUES  ( 1002, 'Bugs', 'Bunny' )
     
    INSERT  dbo.CustomersActive
            ( CustomerID, FirstName, LastName )
    VALUES  ( 1003, 'Daffy', 'Duck' )
     
    INSERT  dbo.CustomersActive
            ( CustomerID, FirstName, LastName )
    VALUES  ( 1004, 'Wile', 'Coyote' )
     
    INSERT  dbo.CustomersActive
            ( CustomerID, FirstName, LastName )
    VALUES  ( 1005, 'Elmer', 'Fudd' )
     
    INSERT  dbo.CustomersActive
            ( CustomerID, FirstName, LastName )
    VALUES  ( 1006, 'Captain', 'America' )
    GO

    Problem: we need a clear, transactional way to move rows from the active to the inactive customers table. In older versions of SQL Server, this would have required an explicit transaction, and insert and a delete, plus of course error handling to ensure a customer could not be lost or be duplicated in the middle of the transaction. Today we have TRY/CATCH, which helps on the error handling front, so this two-step process could be written something like this:

    BEGIN TRANSACTION
     
    BEGIN TRY
     
        INSERT  dbo.CustomersInactive (
                        CustomerID,
                  FirstName,
                  LastName
                ) SELECT 
                        CustomerID,
                        FirstName,
                        LastName
                FROM    dbo.CustomersActive
                WHERE   CustomerID IN ( 1002, 1004, 1006 )
           
        DELETE  dbo.CustomersActive
        WHERE   CustomerID IN ( 1002, 1004, 1006 )
           
        COMMIT
       
    END TRY
           
    BEGIN CATCH
           
        ROLLBACK
           
    END CATCH 
    SELECT * FROM dbo.CustomersActive
    SELECT * FROM dbo.CustomersInactive

    This is essentially correct (except, of course, the flow control and error handling would be different and more robust in a real system). But there are some liabilities here:

    1. We must wrap this in an explicit transaction regardless of the context, just to make the two statements commit or roll back together.
    2. We have to state the criteria for the selection of rows to move twice, which is no big deal if done correctly, but invites trouble later. Imagine a developer editing the first “where” clause but forgetting to implement an identical change to the second. Oops.
    3. The server has to do the work to locate the affected rows twice – once for the select and again for the delete. This is no big deal for our simple case, but in many real archiving scenarios, it can be expensive to locate the target rows, and it’s a problem to ask the server to do that work twice.

    Here’s a perfect use case for the OUTPUT clause! If you are unfamiliar with OUTPUT, what it does is essentially cause a SQL statement to emit a set of rows - which can be the data, or a subset of the data, that the statement affected. For example, if we run:

        DELETE  dbo.CustomersActive
        WHERE   CustomerID IN ( 1002, 1004, 1006 )
    

    there isn’t any output from the delete, other than perhaps a count of affected rows. The statement will not capture the deleted rows, it will just act on the table and the rows will be discarded.

    On the other hand, if we add the OUTPUT clause:

    DELETE dbo.CustomersActive
    OUTPUT
        DELETED.CustomerID,
        DELETED.FirstName,
        DELETED.LastName
    WHERE CustomerID  IN ( 1001, 1003, 1005 )

    then the content of the DELETED virtual table is actually emitted when the statement is run, like a select statement, returning the affected rows as a result set:

    CustomerID FirstName LastName
    1001 Quincy MaGoo
    1003 Daffy Duck
    1005 Elmer Fudd

    The real magic with OUTPUT is, not only does it emit rows this way, it also makes SQL statements even more composable than they had been. That is, one SQL statement can be used to generate a derived table, which in turn can act as the input data for another SQL statement, composing the two statements together into one action. That means it’s possible to nest a DELETE, for example, inside of an INSERT statement.

    “Sounds complicated,” is a reaction I see a lot around this.

    But don’t let the language describing this fool you – it’s really simple, and quite handy. Think again about the need to move rows from one table to another. What we really want to do is perform a delete, but grab (i.e. select) those rows that were deleted and then put them into another table. The first bit of that process is to feed the DELETE-ed rows to a SELECT statement. Once we “have” that set, we hand it to an INSERT statement, with INSERT … SELECT.

    INSERT dbo.CustomersInactive (
          CustomerID,
          FirstName,
          LastName
    ) SELECT 
                CustomerID,
                FirstName,
                Lastname
          FROM    (
               DELETE dbo.CustomersActive
               OUTPUT
                       DELETED.CustomerID,
                       DELETED.FirstName,
                       DELETED.LastName
               WHERE CustomerID  IN ( 1001, 1003, 1005 )
          ) AS RowsToMove  
         
    SELECT * FROM dbo.CustomersActive
    SELECT * FROM dbo.CustomersInactive

    The fact that we can take output from DELETE and feed it to INSERT actually models what we are trying to do perfectly. And, we get some advantages:

    1. This is now a single, atomic statement on its own.
    2. The logic about which rows to move is specified only once, which is neater.
    3. The logic about which rows to move is only processed one time by the SQL Server engine.

    I hope this simple example opens the door for you to begin using OUTPUT and composable DML.

    Edit: Be sure to see Ben Thul’s comment below for an even better/more concise way to write this >

  • Chasing the ISV, or, “That code makes my teeth hurt.” T-SQL Tuesday (ish) #21

    Jenga EffectThis month’s T-SQL Tuesday – a blog party dreamed up by sqlblog.com’s Adam Machanic ( blog | @AdamMachanic ) – is about that code we’ve all written that we don’t really like to think about too often. You know the stuff. I can’t help but imagine the next poor guy who comes across some of mine and thinks, “What the … How in … Seriously?”

    I have two gems to share today. They share the theme, “Chasing the ISV,” because they are both SQL Agent jobs that essentially follow ISV code and constantly, 24 x 7, mop up problems with applications.

    The first example I’ll put up here is one I blogged about briefly last year. We had an ISV system that was riddled with pretty horrible triggers, which I was hesitant to touch for fear of the “Jenga effect.” Pull just one line of trigger code out and – who knows?

    Said system had a production issue because a value stored in two places, that was supposed to match (winning system here) no longer did. I discovered that the underlying reason was that the tables in play had triggers that could not handle a set of rows, but just one row at a time – a too-common anti pattern.

    IF Problem GOTO Workaround

    TSQLWednesday_2C948C01So, what to do. Production issue. Deep breath. Here goes:

    Again:
    
    UPDATE TOP (1) dbo.someTable SET field3 = 'NEW'
    WHERE field2 = 'NEW' AND field3 = ''
    
    IF @@ROWCOUNT > 0 GOTO Again

    Into a SQL Agent job with that, executed once every few minutes.

    So. Wrong.

    Also fixed the issue. And it’s still in use, I believe. I try not to think about it.

    IN ( Pain, Suffering )

    The second bit of pure awesome I have to offer today takes a bit more explanation. I think I should anonymize this one to protect the guilty. (That first example has, believe it or not, the real column names.)

    Another ISV provided a system to one of my past employers that, let’s say for argument, provided a web application to manage distributed offices in locations all over the country. The structure of the system required that people at the company be able to view one or a few of these locations’ data, but not the data from other locations. So the system had a row-level security mechanism to filter the data by groups - a fairly standard idea, but theirs was not a typical implementation at all. The way the initial version of this software operated was, roughly:

    1. An administrator for the application would change some aspect of the security groups using the web application.
    2. The system would, upon clicking Submit, create (or recreate) one or several view definitions, inside of which there was a SELECT with a static IN (,,) list of any and all the locations that a given group or individual could see on the system. The view would provide the row-level security. The IN(,,) list could have a few or hundreds of items. A classic data-as-code anti-pattern.
    3. The system was very busy, so altering the view would often fail or cause … “side effects.”

    The company actually instituted a rule that security changes like this could only be made after hours, and made staff stay late to do that work. If this happened during the day, the system would generally grind to a halt, and none of the 1,000 or so concurrent users could do anything. Why? The IN() lists in these views could have hundreds of values, and they worked against millions of rows of data. That meant that with no meaningful stats to work with about these embedded lists, the query optimizer would always pick a terrible query plan for vital processes on the system.

    The ISV almost fixed this problem. Nearly.

    When complaints surfaced, they went in and added a more scalable feature that would use a many-to-many table to store the security relationships. The table could be indexed, and would have stats, which was wonderful, and it solved the performance issue.

    When they used it.

    Here’s the rub: the were so fearful about disrupting their customers’ existing deployments – a noble but ultimately doomed posture – that they kept the old view mechanism too. Any old features of the product used the old system, while new or rewritten features used the new system. Now their code had to maintain both systems, and attempt to be sure they matched. Which they did. Mostly. Except that time we had to have one of their developers spend a week untangling the thing. But I digress.

    If I recall, and this was a few years ago, so I may have this wrong, it seems like their first table-based implementation didn’t exactly work, and they had to create a second one, and then they kept all three systems, with some massive stored procs to keep them all in sync. And the proc worked, except when there was a logic problem like duplicate rows in the tables, and then it didn’t anymore.

    ALTER VIEW dbo.FixMe

    This is where our little DBA group came flying in with our superhero capes and a brilliant, yet horrible, workaround. I can’t quite remember who created this, so my apologies if I cannot credit the specific individual, but I know I, erm, “tuned” it over the years. I also vividly remember the phone calls and the shouting when this workaround was not operating as designed.

    We made a SQL Agent job to run every few minutes. The code for the job was, essentially:

    1. Examine the text for all the views in the database, via syscomments, where the view name matched some text pattern that makes it look like one of these security views.
    2. If the view definition contains one of these IN(,,) clauses then rewrite the view, via dynamic SQL, as a join between the security tables.

    Like the example above, this job would basically sit there and chase the underlying system, and if a security change was made, it would locate the offending view definition and rewire it on the fly to use the more performant and optimizer-friendly tables.

    I think the vendor has since fixed the view definition issue. Happily, I don’t have to work on this system any longer, so I can’t say whether the views themselves persist to this day.

    Here’s the code, unvarnished but anonymized. I found it on an old flash drive:

    IF EXISTS ( SELECT * FROM syscomments sc1    
       INNER JOIN sysobjects so ON so.id = sc1.id   
       WHERE so.type ='V'    
          AND so.name LIKE 'FOO\_%' ESCAPE '\'    
          AND sc1.text LIKE '% IN (%)'   ) 
    BEGIN     
       SELECT     'ALTER VIEW dbo.' + so.name + 
          ' AS SELECT f.* FROM dbo.foo f, bar l ' +
          ' WHERE f.id = l.id ' +
          ' AND l.id = '     +  
           RIGHT(so.name, len(so.name) - charindex('_', so.name))    
                AS alterViewCommand     
       INTO #viewCorrections     
       FROM syscomments sc1     
       INNER JOIN sysobjects so ON so.id = sc1.id   
          WHERE so.type ='V'    
          AND so.name LIKE 'FOO\_%' ESCAPE '\'    
          AND sc1.text like '% IN (%)'     
    
       DECLARE @sql VARCHAR(4000)     
    
       WHILE EXISTS( SELECT 1 FROM #viewCorrections )    
       BEGIN
             SELECT TOP 1 @sql = alterViewCommand FROM #viewCorrections
    
             PRINT @sql
    
             EXEC ( @sql )
    
             DELETE FROM #viewCorrections WHERE alterViewCommand = @sql
    
           WAITFOR DELAY '000:00:05'
        END
    
        DROP TABLE #viewCorrections
    END
More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement