THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

  • Looking for sp_whoisactive?

    A while ago I quietly announced the new home for sp_whoisactive, whoisactive.com.

    Today I finished migrating all related content -- including the documentation consisting of a series of 28 articles posted on this blog a while back.

    So if you're looking for sp_whoisactive downloads or information on the proc, head over to the new site. And as always, keep the feedback coming!

    Enjoy! 

  • The SQL Hall of Shame

    SQL Server is a big product, with lots of features. Many of the features—hopefully the vast majority of them—are solid and allow us to build trustworthy, secure, and reasonably usable database applications. But nearly everyone has stumbled into a few, shall we say, “darker areas” of the product now and again. Strange and seemingly nonsensical enhancements have come and gone over the years, and we’re all certainly aware of the Microsoft tendency to occasionally introduce amazing-looking but almost unworkable v1 features that never get another glance from the product team prior to their deprecation years later.

    imageThe other day Klaus Aschenbrenner brought this topic to the forefront of my mind (or at least my Twitter stream) with his bold proclamation that the generally lauded “in-memory OLTP” is the “most useless feature in SQL Server.”

    Personally I have yet to see an especially interesting and non-trivial use case for the feature (and no, user session state management is certainly not it), but most useless? Surely, I thought, that prize has to go instead to English Query? Dejan Sarka quickly chimed in and recommended Data Quality Services. It’s tough to argue with that one, but then Brent Ozar mentioned auto-shrink…

    It was at this point that I realized just how many candidates there are for “most useless” things lying around the product. So I decided to create my own tweet. I asked for the most useless feature, anytime between version 7.0 (which I would call the beginning of SQL Server’s “modern era”) and now. I received quite a few suggestions, and so I have decided to catalog them here—along with a bit of personal commentary on each one.

    The list that follows is mostly unordered and culled straight from what I received on Twitter. Hopefully I haven’t missed anything due to Twitter’s weird threading and “priority” mechanisms. And please let me know in the comments if your favorite useless feature is missing, or you’d like to add a comment and/or argument about one of these. Perhaps we can find some way to turn these dark and ugly corners into things of beauty? Well, we shall see…

    So without further ado, the Most Useless SQL Server Features, per my Twitter followers, as of June 2017!

    In-Memory OLTP. The one that started it all. Useless? I guess it depends on where you draw the line between “OLTP” and “memory optimized.” The latter is, I think, demonstrably useful in certain cases. For example, improving the performance of temporary tables. But as for the full-blown OLTP use case, as well as the various native optimized options? Very few databases suffer from the exact types of latch contention for which this feature was created, and there are much better and easier ways to get even better performance gains than are offered by native optimization. This feature seems to be getting continued attention from Microsoft, so it may well prove to be the foundation for some very useful things in coming releases. I am going to reserve judgement for now, whether or not Mr. Aschenbrenner agrees.

    English Query. My number one vote. This was an attempt at bringing “natural language” capabilities into the product. I believe it was introduced in SQL Server 7.0. I’ve never talked to anyone who used it in a production environment. Personally I never even installed it. I don’t recall hearing that it was deprecated, but it disappeared at some point, maybe in SQL Server 2005? No one cried. I guarantee it.

    Data Quality Services (DQS). This one was so promising on the surface, and something that is absolutely needed in many data warehouse projects. A centralized business rules and data verification platform can really help reduce complexity. But DQS never made it past the initial stage. It was (is?) effectively a collection of valid text strings—and that’s it. No real “rules,” no real workflow. And abysmal performance. Abysmal. An excellent suggestion for this list to be sure.

    Master Data Services (MDS). I remember when Microsoft acquired this tool. I was working on a large data warehouse project. We watched an online demo and we’re blown away. We decided right away to use it the moment Microsoft released its version. But then that didn’t happen. For a long, long time. Two years later, as I recall, it finally hit the street. And while I never personally tried it, from what I heard it was a complete disaster of bugs and faulty workflow. I’ve heard that it has gotten marginally better since then, but I think the ship has sailed. Microsoft lost the trust of its end users and this product is forever doomed for inclusion in lists like this one.

    Auto-Shrink. “Stop my file from growing and taking over the disk!” Sounds great, but is it? If you understand even a little bit about SQL Server architecture you understand why that file is growing. And sometimes you really do need to shrink a file, but if you look at the internals of shrink you know that it can cause fragmentation. So shrink is something you want to avoid, not automate. Auto-shrink is a feature put into the product for users who don’t want to deal with the product. It can only cause you more problems. Useless? Sure, I’ll go there.

    Auto-Close. This option is one that few people know what to do with. It’s there for a reason—it helps large instances, with a tremendous number of databases, make better use of system resources when not all of those databases need to be used at the same time. Most shops shouldn’t turn it on. But useless? I’d say no.

    Auto-Grow. When creating a file in SQL Server you have various options. You can pre-allocate as much space as you think you’ll need and hope to fill it up later. Or you can allocate the minimum amount of space and let the file grow itself, or not over time. If you only pre-allocate, you risk not having enough space there when a spike occurs and you need it. If you let your files grow too often, you’ll introduce fragmentation. If you grow your files by large percentages, you can wind up consuming way too much disk space. If you combine the best of all of these—pre-allocating using metrics and allowing for incremental growth as a safety net—you’ll be in pretty good shape. All of which is a long way to say that I have no idea why this feature was suggested; it is, in my book, anything but useless. I think about it and consider its configuration every single time I create a database file. You should, too.

    Notification Services (SSNS). This feature was a framework designed to help generate emails, reports, and other forms of “notifications” when data changed certain ways. (Example: “Let me know when the stock price hits $40/share.”) Added sometime in the early 00s, it was more or less doomed to failure from the get go: It was driven by a very confusing XML-based interface, the documentation was beyond horrible, and the book authoring community largely ignored it (with the exception of a few pages of high-level coverage here and there, and a book written by the guy who created the feature for Microsoft). If you had the patience to muddle through the mess (as I did!) it was possible to get the thing running and doing some fairly decent things. But the learning curve was so steep, and the general situation so poor, that Microsoft didn’t even bother keeping v1 around for more than a single release. The plug was pulled and Notification Services was deprecated prior to SQL Server 2008. Probably for the best.

    Query Notifications. Entirely unrelated to, but often confused with Notification Services. This is a SQL Server and ADO.NET feature that allows applications to receive events when the results of a SQL Server query might change due to data modifications. This feature was released as part of SQL Server 2005 and would be hugely useful in so many cases except that it has proven itself time and again to be entirely undependable and nearly impossible to debug. And Microsoft has, as far as I can tell, put exactly $0.00 of investment into improving the situation. I am actually getting angry as I type, because there are so very many excellent use cases for this thing that has been left out there in its piece of junk state—but I can’t use it in any but the simplest of scenarios due to the risk factor. Thanks, Microsoft.

    Buffer Pool Extension (BPE). A feature that received a lot of attention in SQL Server 2014. Well there wasn’t much else in SQL Server 2014 to give attention to. What exactly is the use case for this thing? It’s not entirely clear, but the combination of a fast local SSD with really slow primary storage isn’t especially common. On Twitter Joey D’Antoni suggested some Azure use case, but I felt like he was pushing for it. Useless? Probably for most of us.

    Management Data Warehouse (MDW) / Data Collector. Everyone who does anything serious with SQL Server buys or builds a monitoring tool. That’s not cheap. Microsoft could provide huge value to its users by shipping a bunch of monitoring and collection capabilities with the product! At least, that was the idea back in 2008. MDW is a classic example of a great idea that got rushed into the product as version 1.0 and then never touched again. It is famously unable to be uninstalled once you turn it on. The data collector is famously limited in what it can do unless you exercise various arcane hacks that are only documented on long-dead blogs written by people who no longer work for Microsoft. And the reporting end is almost nonexistent. Why the ball was dropped, I’ll never know. But in the meantime, I don’t plan to bother turning this thing on.

    Lightweight Pooling / Fiber Mode. This rather arcane feature was added to help very marginally improve performance of very specific workloads. From Day One it shipped with a warning: Don’t turn this on. It breaks various SQL Server features which, apparently, aren’t needed for that very specific workload. I don’t recall the exact specifics around when or why you’d use this, but I never have. No one I’ve talked to ever has. I’m not even sure it still works, but I won’t be trying it out. I wonder if the handful of customers for whom this was created actually benefitted? If that’s you, let us know!

    JSON. The long-awaited follow-up to XML support has finally arrived in SQL Server 2016, and right away someone votes it as most useless feature ever. Tough crowd. Personally I’m on the fence here. I don’t see much reason to store data in a “relational” database in JSON format (or XML for that matter). I do see some very limited use case around rendering data to JSON, but I would question why the app can’t do that? Same on the shredding side. Looking back at the XML features, I have mainly used them to query Dynamic Management Views; and in that case I only did that because I had to, due to the fact that those views expose so much data in XML format! I also heavily use XML for concatenating strings. Not so much actual XML work, though. I assume the same story will hold true for JSON. A few people here and there will use it. Most of us won’t, until Microsoft starts converting all of the system objects over. (Please, no!)

    SQL Server Management Studio (SSMS). I live in SSMS all day long. Aside from a few crashes and the occasional performance issue I’ve had no problem with it, ever, and I felt that it was a huge upgrade from the Query Analyzer and Enterprise Manager days. Sorry, whomever suggested SSMS for this list, but I am absolutely a fan. I’ve tried a few third-party management tools and they don’t even begin to come close. SSMS, useless? Not at all.

    Connect to SSIS from SQL Server Management Studio. I don’t even know what to say here. I saw this option once or twice, tried it, didn’t see anything that pertained to me, and never really looked into it. Maybe Andy Leonard or some other SSIS specialist can fill us in on the point of this thing. I can’t vote one way or another because I completely lack the context.

    DROP DATABASE IF EXISTS. This one was suggested by Greg Low, who explained that it fails if anyone is connected to the database. And disconnecting them first requires checking if the database exists. Well that makes way too much sense. Useless.

    Columnsets (and Sparse Columns in general). I remember being really excited when these features shipped, and I couldn’t wait for them to get some enhancements. (I asked for columnset data type information; my request was denied.) For those who don’t recall, Sparse Columns is supposed to be a way to model really wide tables in which most of the attributes are NULL for any given row, and columnsets is a way to bring back some of the sparse data in an XML format. And these would have been interesting features except Sparse Columns requires making a DDL change on the target table, and it’s really confusing as to when you should or should not use it and how much space you’ll save and so most people just stick with the tried-and-true Entity-Attribute-Value approach. And then there is the manageability problem of maintaining a table with 20,000 columns. Yeah, no thank you. Columnsets were sort of just tacked on to the whole thing and maybe if you could create more than one set and name them or something like that it would be cool, but no, you could only have one, and it has some sort of weird limitations that I don’t remember right now because I’ve never looked at it again and don’t plan on it. I’ll mark this entire feature set, sadly, as useless. But it could have been so great. Version 1.0 For Life strikes again.

    SQLCLR. If you’ve been following me for any amount of time you know I’m a fan. It has its quirks, it’s still pretty much a v1 thing after 12 years, but I’ve gotten plenty of great use out of it. So has Microsoft. Various SQL Server features leverage it internally. Needs love, but hardly useless from where I sit.

    Utility Control Point (UCP). I remember hearing about this and reading up on it when it was released. (2008 R2?) But I can’t say much beyond that. I just looked at the documentation and I’m not entirely sure what this thing is supposed to be. I’m not going to spend any more time on it. Does anyone have a counter argument for why we shouldn’t call this useless?

    Raw Partitions. I am not a storage guy, but I guess this can help improve disk I/O if you’re willing to take on some risk. That seems like a bad tradeoff for most databases. ‘nuff said.

    Uninstall. I’ve lost count of the number of times I’ve installed SQL Server over the years, but I’ve probably only uninstalled a few. And I can’t say it was ever a simple experience. SQL Server tends to leave fragments all over the place, like many Microsoft products, and uninstall is anything but clean. The best way to do it is to simply pave away. So I have to agree with whomever posted this suggestion on Twitter. Uninstall, you’re pretty useless.

    Service Broker (SSB). Ah, Service Broker. I have a real love-hate relationship with this feature. I’ve used it on lots of projects. It works well, when it works. It’s a pain to debug when it doesn’t work. Its architecture is ridiculous and beyond overbaked. (Conversations and dialogs, really? All we want is a simple queue! And the fact that you can create fake conversations to make it go faster? I can’t even…) But when it comes down to it, having some kind of queue in the database is better than not having any kind of queue in the database. It’s solid, transactional, and after a bit of configuration it usually gets the job done.

    Not Freeing Allocated Memory Except Under Pressure. SQL Server allocates as much memory as it needs, up to however much you tell it to use. (The memory configuration amount is even referred to as a “target.”) It uses all of that memory to store your data and query plans so that it doesn’t have to re-retrieve it off of slow disks or re-compute them, respectively. And that is a good thing. Why should it prematurely free that memory? If you don’t want it using as much memory, tell it to use less memory. Useless? No. This is absolutely a feature and a correctly implemented one at that. Sorry, Twitter person.

    Database Engine Tuning Advisor (née Index Tuning Wizard). Back when I was a SQL Server newbie I needed to make things fast. The only problem is that I had no idea how to make things fast. So I clicked around and there was this wizard that said it would analyze my workload and tell me what to do. How cool is that? I tried it and it suggested all sorts of interesting things, which I did. And then a year later I knew a bit more and was kind of embarrassed that I’d created some of these ridiculous artifacts. And then a few years later Microsoft gave the thing its kind of downgraded new name—it’s no longer a full-on Wizard, now it’s merely an Advisor. But here’s the thing: I want my advisors to give me good advice, not wreck my database. So I’ll go elsewhere for advice, like for example my brain. Thanks.

    DBCC PINTABLE. The guy who doesn’t like SQL Server not freeing allocated memory would lose his mind over this one. A long-gone feature at this point, it specifically marked tables such that their data would be infinitely held in memory. (I can’t recall what was supposed to happen under pressure.) As it turns out this is pretty useless given the way SQL Server’s least-recently used cache aging algorithm works. So yeah, useless.

    Virtual Interface Adaptor (VIA) Network Protocols. I don’t even know what this is, I just wrote commentary on over 20 other things, and I’m too tired to read up. So sorry, Gianluca Sartori. Tell us all about it in the comments, if you would.

    Mirrored Backups. Another suggestion by signore Sartori. I have no feeling on this one way or another, but it never seemed like taking a backup and then copying it was all that much of an issue so I don’t really see the point.

    SQL_VARIANT. Long ago I did some work in Visual Basic and quickly became annoyed with the fact that I never knew what type my variables were, or even whether they had been properly initialized. Welcome to the joys of variable data type systems. I was able to work around this somewhat by using Option Explicit, but now let’s think about the difference between your application—which needs to deliver data to your users—and your database—which is expected to store, master, and protect that data. Using a variable data type in the application is annoying. Using it in the database? That’s downright scary. That means you don’t know your data. That said, this data type has been useful over the years in very special cases, such as functions that need to be able to seamlessly return various different types of data. As a matter of fact, Microsoft used SQL_VARIANT in SQL Server 2016 as part of the context data enhancement. So useless? No. But use with extreme prejudice.

     

    So that’s that. The complete list from Twitter, annotated. Got more? Want to argue, agree, or otherwise? Comment away.

  • The Guru (T-SQL Tuesday #089)

    I became a consultant a bit earlier in my career than was probably wise. The going was rough at first, with periods of feast and periods of famine. Although I had several clients, I didn’t understand how to sell services or schedule my workload, and so I’d wind up either doing huge amounts of work for a few weeks, or absolutely nothing some other weeks. The topic of this month’s T-SQL Tuesday is “Database WTFs” and it was during one of my periods of relative inactivity that I met Jonathan at a local user group meeting, which resulted in one of the more memorable WTF situations of my career.

    Jonathan was a few years older than I was. Most of the user group attendees wore jeans and sneakers. Jonathan had on a nice looking suit and shiny shoes. He had a smooth demeanor and introduced himself as “a consultant and SQL guru.” Literally, with that exact phrasing. We struck up a conversation and he told me about all of the amazing work he was doing with various customers. I was extremely impressed. Jonathan represented the consultant I thought I wanted to be, a bit further down the road.

    I told Jonathan about my struggles with finding steady work, and he immediately offered to introduce me to a company he’d spoken with, CorpX, Inc. He explained that they were a growing company and needed lots of performance tuning help. He was just too busy on other things to give them a hand, but maybe I could. I eagerly agreed, and he did an e-mail introduction the next day.

    I don’t remember the initial conversation with CorpX, nor whether there was any kind of interview, but we signed a three month agreement and I told my customers I wouldn’t be available for a while. Before I knew it I was nervously walking in to a strange office for my first day on the job. Once in the company’s space, the receptionist told me she wasn’t expecting me, but that I could sit down and wait while she sorted things out. She offered me a magazine, but only after 20 minutes of waiting did I actually pick it up and start flipping through the pages. The receptionist told me that she thought I was supposed to meet with the CTO, but he was busy, so I should keep waiting. An hour later I’d read the magazine cover-to-cover, so I picked up a second one.

    At some point, a few magazines in, a guy stopped by to chat with the receptionist. He was a some sort of IT person, and she asked him if he knew when the CTO would be available. “Oh, he went for coffee a while ago.” However, the guy had heard that I was supposed to be there that day, and was able to lead me to my space: A cubicle in a dark corner of the basement. Just like Milton’s. He left me there to stew, but it didn’t last long. I had no login credentials, nothing to do, and was tired of wasting my day. So I tracked him down again and asked him to introduce me to the DBA.

    The DBA was a red-faced guy named Dave, and he seemed to be in a state of panic. He spoke quickly and told me he didn’t have time to deal with me; he was in the middle of something way too important. His large office was perfectly clean and ordered, and I clearly recall the three huge monitors set up on his desk (well, huge for the time). A complex ER diagram covered all available screen real estate. I told Dave that I could possibly help out with his SQL Server problem, and he replied that there was nothing for me to do. He deeply involved in designing the next version of the database. He couldn’t possibly think about anything else. He asked me to leave his office.

    vintage-baseball-batsIt was around noon, so I left for a while to get lunch. When I returned to the office, this time I was marched directly to the CTO. “Where have you been? I’ve been waiting for you.” The CTO was a middle-aged guy with a beer gut and, apparently, an extreme baseball fetish. His office was filled with baseballs, helmets, jerseys, and bats. He absentmindedly played with a bat as he talked to me, lifting it and bringing it down on his desk when he wanted to emphasize a point. “We hired you as a pinch hitter. We retired our SQL Server team a few weeks ago, because I finally came to the realization that SQL Server can’t meet our performance demands. It’s like the database farm team, you know? We’re moving to Oracle in six months. That’s a real database! Until then, though, we need to play hard and score one for our customers, and that’s where you come in. You’ll turn our screwballs into fast balls. Well, as fast as that crappy thing can go, am I right?”

    I asked him if he was sure that SQL Server was at fault. Even though I was still fairly young, I had already worked on some pretty big projects. And I had a number of very experienced friends, including some well known MVPs who talked about their work on gigantic systems. “Yes, we’re sure. We had Jonathan in here, and he is a SQL guru. He told us that SQL Server simply does not scale.”

    Part of me wanted to get up and run, but I was intrigued. I decided to treat it as a challenge. I would prove him wrong and save the company millions on its Oracle transition. Maybe I’d score a steady long-term client in the process. I asked for access to a test system and to the source control repository, and the CTO gave me a blank look. “You’re here to performance tune our production database.” There was no need for a test environment, he told me, and he wasn’t aware of any database code in source control. He handed me a printout of a network login and password and told me that I had plenty of access to “work some magic.” I asked him which part of the application to focus on, and he told me to work on any piece I felt like; they were all equally slow. He concluded our discourse by saying that he was a very busy person and as a consultant I was expected to figure things out for myself and not bother him or his staff with details. I was to come back and talk to him when I had made some progress, and not before then.

    Back in my basement hideout, I spent the next couple of hours exploring the network and figuring out which server to connect to. The CTO was right; I did have enough access. I was sysadmin on the production SQL Server and had full admin access to the app server. I logged in to the app and with the help of a Profiler trace managed to figure out one of the main slow stored procedure calls that occurred any time someone saved a change via the user interface.

    Pasting the procedure call into SSMS, I turned on Actual Execution Plan, hit F5, and got ready to see indications of a few missing indexes. I was ready to walk back upstairs, gloat to the CTO, and ask for a better workspace so I could continue to help. What I didn’t expect was what actually came back: Not one execution plan, or two, or three, but hundreds and hundreds. The scroll bar become progressively smaller as time clicked by and the elapsed counter did not stop running. All I’d done in the application was change the name of a single field. What was going on?

    recursionI opened the stored procedure and it was a simple update against a single table. But after poking around in Object Explorer I discovered that the table had four different insert and update triggers. Each of the triggers created a cursor over the inserted table, and for each row therein called one or two stored procedures. Each of these stored procedures was seemingly more complex than the last, with increasing amounts of dynamic SQL and nested cursors. And each did updates or inserts to at least one, and usually several tables. As I chased down those tables I saw the same pattern again and again: More triggers on more tables, with more cursors and more stored procedure calls, with more dynamic SQL and more cursors and more stored procedure calls. I began recursively searching for the bottom of it all, but gave up once I hit 9 or 10 levels. The tables were named with some sort of scheme involving various abbreviations, but with no documentation or support, I had no clue what was going on.

    Getting concerned about my ability to unwind this mess, I looked around the database and discovered that the same pattern was used on every single table. I ran some counts and found around 400 tables, 1200 triggers, and several thousand stored procedures and user-defined functions. Not to mention a massive number of views, all extraordinarily nested.

    The best part? Every single module in the database had a dated copyright notice, containing my friend Jonathan’s name. He had apparently built the entire thing himself over the prior four years, no doubt billing some truly obscene number of hours. And then he had gone ahead and signed his work. To this day I still wonder about the legal implications of that copyright notice. What does it mean to copyright a work for hire that contains logic specific to someone else’s business? And why did CorpX allow this in its code? Over all of that time did no one else ever look at the database?

    I wandered upstairs to ask the CTO what he expected me to do with such a massive mess, but the office was empty. Sitting in the basement, I’d lost track of time, and had spent four or five hours just researching that one initial bit of code path. It was late in the evening, I’d fixed nothing, and I knew that I wouldn’t be able to work any magic. I realized that the guru had already tricked us all, and I decided to cut my losses. I packed up my things, logged out, and left the building. I emailed the CTO that night and politely told him that the engagement wouldn’t work out. I never billed for my day, and he never replied to my note. I looked up Dave, the DBA, on LinkedIn a few months later. He was now “database architect” for a different company. Years later I met someone else who worked for CorpX and found out that they were still, miraculously, in business. Perhaps in the end they did find a magician; I was told that they were still running SQL Server.

    copyright_jonathanI ran into Jonathan one other time, and he asked me why I never thanked him for getting me into such a sweet gig. I merely shook my head.

    (Yes, this is a true story. Well, mostly. I changed some names and details to obscure things a bit. But this is pretty much how things happened. I learned a lot that day!)

  • Solving the Net Changes Problem with Temporal Tables (T-SQL Tuesday #087)

    SQL Server 2008 was perhaps not the meatiest of SQL Server releases, but it did have one especially promising feature: Data change detection. Or maybe two features, depending on how you look at things. I’m talking, of course, about the almost confusingly similarly named Change Data Capture and Change Tracking.

    Change Data Capture was the big gun that promised to solve everyone’s change tracking woes (well at least everyone running Enterprise Edition). In practice it was a heavy, overwrought hack of a feature. Built on the foundations of replication—a feature designed to move data, not record and store changes to it—the entire system was quirky and problematic. Users had to become familiar with odd APIs, low-level transaction identifiers, and the occasional call to sp_repldone for that wonderful moment when the whole thing crashed and burned. But enough about this feature. If you want to know more about it and its future in SQL Server, look it up in the Azure SQL Database documentation.

    Change Tracking, on the other hand, was designed to solve only one problem. It did so using a totally integrated approach, with changes recorded as part of the write transaction, first-class T-SQL syntax, and a very simple and well thought out API. The Change Tracking problem? Net changes. Simply put: First give me all of the rows. Then let me periodically ask what’s changed since last time I asked. When I ask, tell me how to synchronize my copy, cache, app, or whatever is on the other end, to get it into the same state as the host data set.

    Fast-forward to SQL Server 2016, and now we have Temporal Tables. This feature is, in many ways, the best of both worlds when it comes to change detection. Temporal Tables is as well integrated into the query engine—and the transaction—as Change Tracking, but captures all changes, much like Change Data Capture. It has a clean and simple interface, and while it’s definitely a v1 technology it seems like a much more solid foundation than either of the SQL Server 2008 features ever did.

    This month’s T-SQL Tuesday, hosted by Matt Gordon, asks for solutions to old problems using “shiny new toys.” I wondered—could I solve the Change Tracking net changes problem using Temporal Tables? Unlike its predecessors, net changes is a major use case that Temporal Tables simply doesn’t do right out of the box. Turning on both Temporal Tables and Change Tracking side-by-side is an option, but that seems like overkill. One change detection technology should be enough…or is it?

    Thinking through the problem, it quickly became apparent to me that once you have all changes—as we have with Temporal Tables—getting to a net scenario is not especially difficult. Given a primary key and a time interval over which we’re concerned about changes,  there are of only four possible outcomes:

    • A new row for the key was inserted
    • The row was updated
    • The row was deleted
    • Nothing happened

    For now we can ignore the final outcome and focus on the first three, but of course any proper solution must ensure that the negative case is taken into account so as to not send false changes.

    For now, consider the following set of propositions, given that we’re asking at time Y for all changes since a prior time X.

    • INSERT: The key did not exist at time X but does exist at time Y.
    • DELETE: The key existed at time X but does not exist at time Y.
    • UPDATE: The key existed at both time X and at time Y, and at least one change occurred between time X and time Y.

    Given these assumptions, we can begin work on a temporal queries that return the necessary rows. Solving for these conditions will require all rows that were active as of time X, rows that were (or are) active at time Y, and for the final case, all rows that were active in-between times X and Y. Since this is a range-based scenario, our best Temporal predication option will be either FROM or BETWEEN. The difference between these two is subtle: FROM uses an open interval (non-inclusive at both endpoints), whereas BETWEEN uses a half-open interval, inclusive on the end date. Given the choice in the scenario, BETWEEN makes more sense, as we can take advantage of the inclusive endpoint to avoid dropping a badly-timed row. But more on that in a moment.

    To begin solving for net changes, we must first get all changes over the time interval in question:

    SELECT
        t.pk,
        t.[other columns],
        t.valid_start_time,
        t.valid_end_time
    FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @end_time AS t

    This query will return all rows with a valid_end_time greater than the passed-in @start_time and a valid_start_time less than or equal to the passed-in @end_time. With that in mind, we can start putting together some expressions for each of the operations in question.

    First step in tackling the insert: Did the key only come into existence after @start_time? The following expression finds out by testing the minimum valid_start_time per key; if it’s after the passed-in @start_time, we know it’s a new key.

    CASE
        WHEN 
            @start_time < 
                MIN(t.valid_start_time) OVER (PARTITION BY t.pk)
                    THEN 1
        ELSE 0
    END AS is_new_key

    A key could have been both inserted and deleted in the interval between @start_time and @end_time—or simply deleted altogether—so a second expression is necessary to determine whether the row still exists at @end_time. The expression below accomplishes this by checking the @end_time against the maximum valid_end_time per key. If the @end_time is greater than or equal to the maximum valid_end_time then the key must have been deleted. (This is where we’re taking advantage of the inclusive endpoint used by the BETWEEN predicate.)

    CASE
        WHEN
            @end_time >=
                MAX(t.valid_end_time) OVER (PARTITION BY t.pk)
                    THEN 1
        ELSE 0
    END AS is_deleted_key

    The final case is that of update operations. A ranged Temporal predicate will return at least one row per key that was active during the range. If the query returns more than one row for a given key, some change must have occurred. A simple COUNT will suffice in this case.

    CASE
        WHEN
            1 <
                COUNT(*) OVER (PARTITION BY t.pk) THEN 1
        ELSE 0
    END AS is_changed_key

    Putting this all together, we can come up with a general template query to answer net changes style questions. To make this work I put the above expressions into a derived table, to which I added a ROW_NUMBER. The numbering is used so that we get only a single version for each row – the most recent one. (Consider cases where a row was updated multiple times during the interval.) The query below uses the derived table [x], and applies a WHERE clause that filters the set based on the expressions, in order to output only rows we’re interested in: new, deleted, or changed rows, except in cases where a row was both inserted and deleted in the input range. The CASE expression in the outer SELECT list is used to determine what type of operation will have to be done on the synchronizing side in order to bring the two sets into parity.

    SELECT
        x.*,
        CASE
            WHEN x.is_new_key = 1 THEN 'INSERT'
            WHEN x.is_deleted_key = 1 THEN 'DELETE'
            ELSE 'UPDATE'
        END AS change_operation
    FROM
    (
        SELECT
            t.pk,
            t.[other columns],
            t.valid_start_time,
            t.valid_end_time,
            CASE
                WHEN 
                    @start_time < 
                        MIN(t.valid_start_time) OVER (PARTITION BY t.pk)
                            THEN 1
                ELSE 0
            END AS is_new_key,
            CASE
                WHEN
                    @end_time >=
                        MAX(t.valid_end_time) OVER (PARTITION BY t.pk)
                            THEN 1
                ELSE 0
            END AS is_deleted_key,
            CASE
                WHEN
                    1 <
                        COUNT(*) OVER (PARTITION BY t.pk) THEN 1
                ELSE 0
            END AS is_changed_key,
            ROW_NUMBER() OVER (PARTITION BY t.pk ORDER BY t.valid_end_time DESC) AS rn
        FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @end_time AS t
    ) AS x
    WHERE
        x.rn = 1
        AND
        (
            x.is_new_key = 1
            OR x.is_changed_key = 1
            OR x.is_deleted_key = 1
        )
        AND NOT
        (
            x.is_new_key = 1
            AND x.is_deleted_key = 1
        )

    And in theory, that’s that. This query does in fact solve for net changes. But the devil’s in the details, and there is a subtle bug in the above query: What happens if the passed in @end_time happens to fall at exactly the same time that an update occurred on one of my rows? The answer is that the MAX(t.valid_end_time) for that key will be equal to the passed in @end_time, and so the logic above will consider the update to be a delete. Clearly not desirable behavior!

    The solution? In an update operation, Temporal Tables logs the new row with the same valid start time as the old row’s valid end time. Therefore, to figure out whether an update occurred, we must look forward exactly one row from our actual intended end the. To accomplish this, the following modified version of the query creates a new search end time by bumping the input end date up by 100ns (the finest resolution supported by DATETIME2).

    DECLARE @search_end_time DATETIME2 =
        DATEADD(NANOSECOND, 100, @end_time)

    SELECT
        x.*,
        CASE
            WHEN x.is_new_key = 1 THEN 'INSERT'
            WHEN x.is_deleted_key = 1 THEN 'DELETE'
            ELSE 'UPDATE'
        END AS change_operation
    FROM
    (
        SELECT
            y.*,
            CASE
                WHEN
                    1 <
                        COUNT(*) OVER (PARTITION BY y.pk) THEN 1
                ELSE 0
            END AS is_changed_key,
            ROW_NUMBER() OVER (PARTITION BY y.pk ORDER BY y.valid_end_time DESC) AS rn
        FROM
        (
            SELECT
                t.pk,
                t.[other columns]
                t.valid_start_time,
                t.valid_end_time,
                CASE
                    WHEN 
                        @start_time < 
                            MIN(t.valid_start_time) OVER (PARTITION BY t.pk)
                                THEN 1
                    ELSE 0
                END AS is_new_key,
                CASE
                    WHEN
                        @end_time >=
                            MAX(t.valid_end_time) OVER (PARTITION BY t.pk)
                                THEN 1
                    ELSE 0
                END AS is_deleted_key
            FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @search_end_time AS t
        ) AS y
        WHERE
            y.valid_start_time <= @end_time
    ) AS x
    WHERE
        x.rn = 1
        AND
        (
            x.is_new_key = 1
            OR x.is_changed_key = 1
            OR x.is_deleted_key = 1
        )
        AND NOT
        (
            x.is_new_key = 1
            AND x.is_deleted_key = 1
        )

    This query uses @search_end_time in the Temporal predicate, but then does all of the comparison work using the original @end_time. For the is_new_key check, this doesn’t matter, as we’re only comparing start times. For the is_deleted_key check it makes all the difference in the world, as an update done at exactly @end_time will populate a new row which will increase the value of MAX(t.valid_end_time). Because the extended search predicate can return an extra row, the is_changed_key and rn expressions had to be moved to an outer table expression filtered by the original @end_time. If an update occurs exactly at @end_time, we don’t want it to trigger a delete operation now, but we also don’t want to actually see it until next time we synchronize.

    That bug solved, we now must consider a second, slightly more important unsolved problem: What do you use for the @start_date and @end_date?

    Naïvely speaking we should be able to ask for a base set of rows by using some long-past date as the @start_date—say, 1900-01-01—and the current SYSUTCDATETIME() as the @end_date. Then we should be able to pass back that same @end_date next time as the start date, and so on and so forth. But that approach will open you to a major issue; Temporal Tables simply wasn’t designed for this.

    Backing up a bit: Change Tracking, which was in fact designed for answering net changes questions, works by placing a surrogate internal transaction identifier on each row touched during a transaction, and then exposing an external transaction identifier that corresponds to the commit time of the transaction. It does this by using an external “transactional commit table” in conjunction with the main changes table. But Temporal Tables uses an entirely different system, wherein there is no external table, and each row is populated with its one-and-only, both internal and external identifier, as it is being written. And every row touched by a given transaction gets the same exact time: The start time of the transaction.

    So what’s does this mean when we’re talking about net changes? Pretend for a moment that you start a transaction, jump on a plane, go hang out on the beach in Jamaica for a week (highly recommended), and then return to your office (lightly sunburned), update a table of your choosing, and commit the transaction. The time stamped on the row will correspond to the start of the transaction—a week ago, before you ever boarded the plane. Meanwhile, your synchronization process has been running regularly, let’s say once an hour, and it thinks it’s lagged by only an hour. It’s never going to ask for rows that changed a week ago. This is referred to in change detection parlance as a lost update.

    Does this mean we’re completely out of luck? Of course not—but it does mean that a complete solution will have to consider the transactional state of the system. And you’ll have to monitor to make sure that no one starts a transaction and leaves it hanging around for a week. (You should probably be doing that anyway.)

    The key to my full solution is the sys.dm_tran_active_transactions DMV. This view contains one row per current transaction—either explicit or implicit—and a handy column called transaction_type which categorizes the transactions as read, read/write, system, or distributed. It also includes a transaction start time, which can be used to figure out which times we might not want to touch just yet, if some transaction is still outstanding from then.

    The following expression uses the DMV and is designed to output a safe @end_date, that will avoid lost updates. It starts by asking for the minimum start time for any active read/write transactions (transaction_type 1). If there are none, it uses the current SYSUTCDATETIME instead. Just to be on the extra safe side—in case there was a delay in processing due to sitting on a long runnable queue, for example—the expression pulls five seconds off of the end time. Note that this code also needs to be able to covert the local time as reported by the DMV into UTC as required by our Temporal predicate. There is currently no reasonable way to get the local time zone in T-SQL (there is at least one hack involving a registry read, which I do not recommend), so you’ll have to replace the hardcoded Eastern Standard Time with whatever time zone is appropriate for your server.

    CONVERT
    (
        DATETIME2(0),
        DATEADD
        (
            SECOND,
            -5,
            ISNULL
            (
                (
                    SELECT
                        CONVERT(DATETIMEOFFSET, MIN(transaction_begin_time))
                            AT TIME ZONE 'Eastern Standard Time' AT TIME ZONE 'UTC'
                    FROM sys.dm_tran_active_transactions
                    WHERE
                        transaction_type = 1
                ),
                SYSUTCDATETIME()
            )
        )
    )

    The final operation I’ve added to the end date expression is to truncate the fractional (sub-second) precision off of the DATETIME2. The reason for this is that it’s important that subsequent change requests use the exact same start time as the prior end time. If you’re storing the value from your last synchronization in a system that has slightly different date rules than SQL Server, you may lose some precision and wind up with a rounding error. So better, in my opinion, to delay changes by up to one additional second, by ensuring that any possible date/time data type will suffice.

    Once all of this is in place on your server, pointing at your table, usage is pretty simple:

    • Set a start date of 1900-01-01.
    • Set an end date using the expression above.
    • Ask for “changes.” That’s your input set. (Similar to Charge Tracking’s VERSION.)
    • Record the end date, and use it as the start date next time. The same date functions as both end date one time and start time the next because of the BETWEEN predicate’s half-open interval; if a delete operation occurred at exactly the end date, it will be included in that synchronization due to the end date being inclusive, but it will not be included in the next synchronization because the start time comparison is non-inclusive.
    • Re-populate the end date each time using the expression. It may be worthwhile to log the start and end dates so that you can detect conditions where the new end date is not much greater than the start date. This would indicate either that you might want to increase your polling interval or figure out why user transactions are running for a long time.
    • Repeat as needed. Remember that every cache is effectively stale the moment you’re done synchronizing it, so you must strike a balance between synchronized enough and not putting excessive stress on the host system trying to keep things up to date. And given the safety measures I’ve taken in the above expression, if you use this system and ask for changes more often than once every six seconds, you’ll tend to not get very far.

    That’s it—net changes from Temporal Tables. Not the simplest exercise in the world, and maybe not quite as perfect as we’d like, but completely workable none the less. And of course, it’s always fun to put shiny new toys completely through their paces.

    Enjoy, and happy synchronizations!

  • SQL Server 2016 "Hidden Gems" - Now on a YouTube Near You

    Sad that you missed last week's SQL Server 2016 "Hidden Gems" GroupBy session?

    Unlikely, perhaps...but no worries either way! It's now available for free, on YouTube:

    https://www.youtube.com/watch?v=P4608MNM-QU

    Enjoy!

     

  • More Query Tuning Training: Chicago!

    I'm very happy to announce a third 2017 date for "Tuning Your Biggest Queries!"

    On March 10 I'll deliver the seminar in one of my favorite cities, Chicago.

    This delivery is a pre-con for SQL Saturday Chicago, which will run on March 11. But here's the thing: The SQL Saturday is already at capacity and wait listed. The organizers have decided to allow pre-con attendees to skip the queue. So as a bonus for attending Friday's pre-con, you now get guaranteed admission to Saturday's event. Not a bad prize!

     

    And by the way, there are still seats available for Boston (January 27) and Cleveland (February 3).

     

    See you soon! More dates forthcoming. I'm especially looking for some hosts in Europe and the US (or Canadian) west coast. If either of these describes you, drop me a line!

  • SQL Server 2016 “Hidden Gems” Resources

    Today I was honored to present my SQL Server 2016 “Hidden Gems” talk for the GroupBy online conference.

     

    The slide deck and demos are attached to this post.

     

    Following are some links to help you in further exploration of the various features I discussed:

     

    STRING_SPLIT

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    Aaron Bertrand’s performance tests: https://sqlperformance.com/2016/03/sql-server-2016/string-split

     

    Session Context

    https://msdn.microsoft.com/en-us/library/mt605113.aspx

     

    Time Zone Support

    sys.time_zone_info: https://msdn.microsoft.com/en-us/library/mt612790.aspx

    AT TIME ZONE: https://msdn.microsoft.com/en-us/library/mt612795.aspx

     

    HASHBYTES

    https://msdn.microsoft.com/en-us/library/ms174415.aspx

     

    Session Wait Stats

    https://msdn.microsoft.com/en-us/library/mt282433.aspx

     

    Live Query Statistics

    https://msdn.microsoft.com/en-us/library/dn831878.aspx

    Ability to attach to other sessions' Live Queries!  https://blogs.msdn.microsoft.com/sql_server_team/query-progress-anytime-anywhere/

     

    Actual Query Plan Data Enhancements

    https://blogs.msdn.microsoft.com/sql_server_team/added-per-operator-level-performance-stats-for-query-processing/

     

    INPUTBUFFER DMV

    https://msdn.microsoft.com/en-us/library/mt652096.aspx

     

    Database Scoped Configuration

    https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/05/02/database-scoped-configuration/

     

    Increased Number of Index Keys

    https://msdn.microsoft.com/en-us/library/ms143432.aspx#Engine

     

    Increased Index Key Size

    https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/04/26/increased-nonclustered-index-key-size-with-sql-server-2016/

     

    COMPRESS

    https://msdn.microsoft.com/en-us/library/mt622775.aspx

     

    Columnstore

    Too much stuff to list! Go google it.

  • The SQL Genie (T-SQL Tuesday #086)

    At some point a long time ago, when I was first getting serious about SQL Server, I became aware of an open secret.

    You see, there was this special Microsoft alias called “sqlwish.” Those in the know could send an e-mail and pass along suggestions for improving the product. And they wouldn’t just vanish into thin air; some actual member of the product team on the other end would not only read the suggestions, but also add them to the internal Official List of Stuff to Get Done.

    How exciting! A chance to make my mark on the product I worked with day in and day out. I immediately put together a list of my wishes and sent an e-mail. And over time, I sent a few more. I never heard back from anyone, so I have no clue if my missives were actually read. And while I can’t remember all of the things I asked for, I’m pretty sure none of them were ever implemented. (I can clearly remember only one request from that very first e-mail. I’ve asked for it again numerous times over the years and still want it today. No luck.)

    Fast-forward a few years and Microsoft introduced the Product Feedback Lab, a web-based solution for user feedback collection. The sqlwish alias was retired and now we could submit feedback in an open, transparent way. We were told by Microsoft that not only would someone read these items, but now we would actually get status updates and replies! An amazing idea. And it went well for a while. Users loved the Lab, populated it with lots of items, and Microsoft soon promoted it to a permanent home in a place called Connect.

    Alas, the powers that be at Microsoft seem to have severely under-estimated the amount of work and attention this endeavor would require. Members of the product group got busy or moved on to other things and the replies stopped coming as often. Users lost enthusiasm. And time simply passed, as it does. After a while Connect fell into a state of abandonment. The SQL Server items have occasionally seen periods of promising forward movement, as well as at least one horrible stage during which the product group indiscriminately closed virtually every open item. But generally speaking the site is mostly considered to be a joke by the SQL Server professionals I’m friendly with. It’s insanely slow, has an unbelievably awful UI, and most of the submitted items are eventually “resolved”—if they ever are—with the infamous “Won’t Fix” status. (Said resolutions are almost never accompanied by a comment from Microsoft or anyone else.)

    All of which doesn’t mean that Connect is completely useless. As desperate as we often are to see change in certain areas, Connect has become an amazing catalog of user stories. It’s one of the only mechanisms for most people to voice concerns about the direction the product is taking, and it turns out that DBAs have quite loud voices. Some tickets actually do go through the system and end up in that exalted state of having been “Fixed.” One such case that I will never forget occurred early on, when Hugo Kornelis demanded the release of SQL Server 2005 SP3. Over 800 people joined him and the service pack was released. A true win for and by the SQL Server community.

    But today’s post isn’t about victory. Today’s post is about a dream.

    This month’s T-SQL Tuesday challenge, issued by Brent Ozar, is simple, vague, and frustrating: find an interesting item on Connect and write about it. The problem for me is that I can’t find just one interesting item; there are simply too many from which to choose. But sometimes when I’m both especially bored and feeling a bit depressed about SQL Server I’ll peruse Connect by using one of its only good features, the ability to sort the items by “most voted.” (Note that even this feature doesn’t even work properly; the sort order is only kinda-sorta accurate.)

    What if, I wonder, Microsoft took a step back and actually listened to the voice of the masses? What if Microsoft actually committed to implementing the top voted items on the list?

    That’s exactly what would happen if Microsoft put me in charge. (Imagine that!) Looking over the most voted—but still open—items, and sorting the wheat from the chaff just a bit (I only want legit developer features), I could come up with a pretty amazing release.  Here’s the session I would present at PASS just before Microsoft fired me for not filling the bits with enough marketing fluff:

    SQL Server v.Next.Next++: Developer Enhancements That Will Completely Change Your Life

    Hyperbole? Maybe just a little bit. But check out some of these amazing features…

    First and foremost, I would eliminate the biggest debugging time waster in the history of the product. (Bad error messages!) That sounds nice. Nine years and 1300 votes, for those keeping score.

    I would so, so love to see this virtual table of errors that would completely revolutionize relational ETL. I’ve even written about it before. (Three and a half years ago. Sigh.) 600 votes over four years.

    Regular expressions. I have to admit that I absolutely hate them and even when I work in C# I often find some other way to solve my problems. But the built in LIKE/PATINDEX patterns just don’t cut it. And yeah, you can implement some SQLCLR solution for this, but it should just be built in by default. Only 475 votes in the past 10 years, but I guarantee this would be an amazingly popular feature among the hardcore developer set.

    Scalar functions are awful. It bears repeating. Scalar functions are awful. Just for good measure, maybe once more. Scalar functions are awful. But they’re so incredibly tempting! Encapsulation and re-use simply make development better, more scalable, more manageable. If only scalar functions weren’t so very awful. Fix them? Yes. Please. Now! 560 people over the past 10 years have agreed with this sentiment. I bet that number would be much higher if the item was not somewhat strangely worded.

    I use a lot of derived tables in my T-SQL. I bet you do, too. I spend so much time writing so very many derived tables. And they bloat my statements to ginormous proportions. But I have no choice! I need to be able to use expressions as filter qualifications. Oh wait, there’s an amazing solution for this problem, suggested by Itzik Ben-Gan. 340 people over the past seven years have ticked the upvote button. Maybe Microsoft would have implemented it were it suggested by someone who actually knows T-SQL pretty well. The world may never know.

    Table-valued parameters are one of those things that I was so excited about when they were released. And I told myself that typical Microsoft lie. “It’s a v1 thing. It will be improved over time.” Well, that hasn’t happened. And this is a feature so very ripe for improvement. Removing that READONLY restriction would mean the difference between today’s “meh” and an absolute developer productivity masterpiece. Nine years. 320 votes.

    How many times have you been asked by end users to change the order of columns in a table? “But tables are sets, and sets have no explicit order.” Guess what? Users don’t care. We waste so much time on this and SQL Server could so easily support changing the order. And this item asking for a feature to do it has been in the system so long (almost 12 years) that when you click on it you don’t even see the item, you get an error. (When Microsoft migrated the items from Product Feedback Lab to Connect, something got messed up and a large number of items went into this limbo state. Perhaps someone got a column out of order? It’s been 10 years since that transition and I’m not expecting a fix anytime soon. But you can still see the number of votes on the main list: 270. I’m sure making these items almost completely unavailable did them no favors in terms of votes.)

    NULLs. What’s the first thing you think of? 3VL? You’re a geek. Annoyance over subtle bugs? You’re quite correct. And while there are some pretty brilliant workarounds to the various NULL issues, the fact is that the ANSI standard includes a clause specifically dedicated to eliminating a huge subset of the problems. Why, after nine years and almost 300 votes, do we not have this?

    Magic numbers! That sounds cool. Except it’s not. It’s a horrible drain on readability and a pox on the manageability of your code. But we need them, because every database out there has “lookup tables” with specific values that we need to predicate on throughout the code base. SQL Server could solve this problem once and for all by allowing users to promote these values to named enumerators. How cool would that be?!? 220 votes, nine years.

    And finally, one more from Mr. Ben-Gan, a clever idea for improving TOP with the OVER clause. I love the OVER clause. I love TOP. Their offspring would be beautiful indeed… 180 voters over the past nine years have shared in this sentiment.

    Quite a list, isn’t it? I think it would in fact change the entire developer experience for the better. But these items are old, and these numbers aren’t too impressive. 180 votes, really? There are probably a hundred thousand SQL Server DBAs and developers out there. The community, it seems, has no faith in Connect. People have neither time nor inclination to navigate its painful interface just to click the vote button on an item that will sit untouched for a decade. It’s a sad state of affairs.

    Eternal optimist that I am, I still hit Connect from time to time. I still vote, and I still submit my own items. I remain ever hopeful. Microsoft, if you need a new product head and fall guy, just for one release, I’m waiting by the phone...

    And just for the record, the number one item in my very first sqlwish e-mail, and something I’d still love to see today? Implementation of DOMAINs, an ANSI-standard feature that would bring the type system out of the Dark Ages and into the 21st Century. Here it is on Connect. “Won’t Fix.” The genie clearly slept through that one.

  • T-SQL Tuesday Rules of Engagement

    Just over seven years ago (wow, does time fly!) I launched T-SQL Tuesday. The core idea is simple: Each month, one blogger acts as “host.” He or she is responsible for choosing an interesting topic in or related to the SQL Server realm. Then a bunch of other bloggers all write articles on that topic. Finally, the hosting blogger puts together a round up of all of the participating posts so that anyone who’s interested in reading can find all of the content in one convenient place.

    We’ve now gone through 85 months of this experiment (#86 was announced just today, by Brent Ozar). And I’m happy to report that at least from my perspective, it has mostly been a huge success. We’re nearing a big milestone, just a bit more than a year away, of hitting triple digits. And I don’t see this slowing down anytime in the foreseeable future. There is still plenty of sharing and learning to be done.

    In speaking with Brent, he mentioned that he wasn’t sure exactly what the correct rules are or where to find them, and he asked me to put things together in one clear place. It kind of makes sense that there is no central place: Although I control the hosts and help dictate the general rules, T-SQL Tuesday is an entirely community-based thing and has grown and spread quite organically. Each month, each host is free to bend the rules just a bit to fit his or her unique blog style. But there are some central guidelines and best practices, and I’ll document and describe each of them below.

    Before I present the rules, I’d like to take a moment to describe the purpose of all of this: T-SQL Tuesday is designed to strengthen the SQL Server blog community in two different ways. First and most obvious, it generates blog posts, and depending on the topic these tend to be educational in nature. More education means a stronger community, and that’s what we’re all about. Second, and a bit more subtly, T-SQL Tuesday was conceived to strengthen search engine reputation by creating lots of links between blogs. Search engines like pages that get linked to a lot. So we generate lots of links to lots of great content, and then that great content floats up to the top as people search for stuff. That’s a win-win in my book and something to keep in mind if you either host or participate in a T-SQL Tuesday.

    (Aside: Does driving up your blog traffic sound good to you? Write to me about hosting! See below for the rules. I am literally always looking for hosts, especially new blogger hosts.)

    Now that that’s out of the way, without further ado, I present…

    The Official T-SQL Tuesday Rules, as of January 2017

    1. Each month, a host blogger will be assigned to create a T-SQL Tuesday event on his or her blog.
    2. Creating an event entails several steps:
      1. First the host blogger must be assigned to a month.
        1. Getting a month assigned means writing to me (amachanic [at] gmail [dot] com) and asking to host.
        2. Prior to writing to me you must have participated in at least two prior T-SQL Tuesday events as a blogger.
        3. Prior to writing to me your blog must show decent on-going activity. This means, for established blogs, at least 6-8 technical posts a year. For newer blogs, I expect to see at least one technical post a month sustained for at least six months.
        4. Assuming that everything checks out, I will give you the next available month. Usually things are booked out four to six months in advance.
        5. I try not to repeat hosts more often than once every 24 months.
      2. The host blogger must first choose a topic.
        1. Despite the name of the event, this topic does not need to be about T-SQL, and in fact ideally should not be. Any topic in or related to the SQL Server realm can be covered.
        2. It is strongly suggested that topics be technical in nature, and also strongly suggested that topics be broad enough to cover a variety of different interests and job functions. A T-SQL specific topic would therefore not be recommended, as it would eliminate people who want to write about MDX, DAX, R, PowerShell, C#, or some other related toolset. Remember that not everyone is you, and that’s what makes this fun; we want to hear from all sorts of different perspectives.
        3. A list of all prior topics is available here: https://voiceofthedba.com/t-sql-tuesday-topic-list/. This list is maintained by Steve Jones, to whom I owe a huge debt of gratitude.
      3. The host blogger must create an invitation blog post. The invitation post is the means by which the host blogger will advertise that month’s event.
        1. The invitation post must include T-SQL Tuesday and the event number in its title.
        2. The invitation post must include the T-SQL Tuesday logo.
        3. The invitation post should usually go live either on the Tuesday one week prior to the second Tuesday of the event month, or on the Monday before that Tuesday. Note that some hosts have posted a bit earlier than that, in order to give participants more time. That’s allowed, but it usually backfires in the form of low participation as participants forget to write their posts. Posting later than a week out can backfire as well, due to people not having enough time.
          1. If your post does not go live by close of business on the US East Coast on the Tuesday one week prior to the second Tuesday of the event month, and you haven’t been in contact with me, you will lose your month and you may be banned from ever hosting again. I luckily have not had to exercise this clause very often! Thank you, hosts, for fulfilling your duty.
        4. The invitation post should describe the topic and outline what kinds of blog posts participants should write in order to participate in the event.
        5. The invitation post must mention that participant posts should go live on the second Tuesday of the event month. The recommendation is to limit this to 0:00 UTC to 23:59 UTC on that day, but hosts are free to change the rule to local time zones or however they see fit to enforce it.
        6. The invitation post should describe how participants should make the host aware of participating posts – via trackbacks, comments on the invitation blog post, etc.
        7. I appreciate it if invitation posts tell people to contact me so that they can host. But that’s not required.
      4. The host blogger should advertise his or her invitation through social media channels. Advertising can be done on the official T-SQL Tuesday hashtag, #tsql2sday.
      5. The host is not required to participate in the month’s event by creating his or her own on-topic post, but it is recommended.
      6. After the second Tuesday of the month, ideally by the end of the week, the host is expected to post a “round up.” The round up is a blog post that reiterates the month’s topic and links to all of the participant blog posts. Ideally this round up post should include some overall commentary on the content that was produced, and maybe some individual comments on each blog post.
    3. Participating in a T-SQL Tuesday simply means writing a blog post
      1. Any blogger can (and should!) participate. It does not matter how long your blog has been active or where you blog or even what language you post in. WE WANT YOU.
      2. The blog post should be on topic for the month as described in the invitation blog.
      3. The blog post should go live at the appropriate time as specified in the invitation post.
      4. Participant blog posts need not mention the text “T-SQL Tuesday” in either the subject or the body.
      5. Participant blog posts must include the T-SQL Tuesday logo. The logo must link back to the invitation post. This is the primary hard and fast rule that drives T-SQL Tuesday and the host reserves the right to not link back to you if you don’t follow it. If you don’t link, you’re not acting in the spirit of the event and you are in fact being very rude.
      6. That’s it! Have fun, write great content, and submit it.
    4. All other questions or concerns
      1. Figure it out for yourself and do what you think is best
      2. Failing number one, write to me and I’ll tell you what I think is best

    Amazing how 85 months turned my initial single paragraph description into this big set of rules. But it’s really just a simple exercise in spreading our collective knowledge. So go forth, write, learn, and enjoy. See you next Tuesday!

  • Query Tuning Training in Early 2017: Boston and Cleveland!

    A couple of days ago in Washington DC I presented the final 2016 delivery of my Tuning Your Biggest Queries seminar. The class went really well (at least from my perspective!) and it was a great end to a busy 2016 training schedule.

    The seminar has been an absolute joy to teach and I think it is some of the best work I've ever put together. Advanced query tuning is my favorite topic, by far, and there are so many interesting routes through the material that it never seems to get old or boring. At every class attendees have been asking unique and challenging questions that really make things feel dynamic. Plus, thanks to Microsoft's increased release cadence and focus on performance, there are a lot of new and cool things on the horizon that I plan to add to the content in the coming months. Exciting times to be a query tuner!

    So all that said, now that this year is closing down, I'm happy to report that I'm already working on more dates for next year! Here are the first two:

     

    Boston area, January 27, 2017 - Information Here

    This will be a standalone delivery, which I'm working on in conjunction with (and in support of) the local New England SQL Server users group.

     

    Cleveland, OH, February 3, 2017 - Information Here

    Just a week after the Boston class I'll do it again, this time as a pre-con for SQL Saturday 595. And hopefully I can squeeze in a visit to the Rock and Roll Hall of Fame!

     

    Even more dates forthcoming - stay tuned, and I hope to see you in 2017! 

  • Learn Advanced Query Tuning in Washington DC

    Just about every database professional on Earth has the same goal: Make things faster.

    There are numerous ways to achieve that goal, but for many people it comes down to a series of hunches, guesses, and trying whatever seemed to work last time.

    What if there was a more efficient way to get the job done? What if you could understand what's going on under the covers, know your options, and make the right moves for your actual situation?

    That's what my full-day Tuning Your Biggest Queries seminar is all about. In this unique class you'll learn about how the query processor works, how the query optimizer thinks, and how you can leverage an understanding of these internals to make your queries faster. What really sets this content apart is that there is almost no focus on indexing -- only a very brief discussion at the end of the day, and only after you've already learned how to improve query performance by orders of magnitude through rewrites.

    The idea is not to make you an index champion. The idea is to set you on the path to advanced query tuning mastery.

    I am pleased to announce the final 2016 public delivery of the seminar -- Friday, December 2, in Washington DC. And you can get early bird pricing through the end of this week!

    This delivery will act as a pre-conference seminar for SQL Saturday DC, but you are free to attend either or both.

    An updated course outline follows. Looking forward to seeing you there!


    Tuning Your Biggest Queries - December 2 - Washington DC 

    Module 1: Query Plans

    • How Query Plans Really Work
    • Data Acquisition Iterators
    • Join Iterators
    • Aggregation and Analysis Iterators
    • Parallel Processing 

    Module 2: Plan Shaping

    • T-SQL Rewrites That Waste Your Time
    • T-SQL Rewrites That Work
    • Query Hints
    • Leveraging Row Goals For Plan Control
    • Understanding Ideal Plan Shape
    • Performance Debugging Complex Plans 

    Module 3: Index Optimization 

    • The Levels of Index Understanding 
    • How B*Tree Indexes Really Work
    • How to Design Optimal B*Tree Indexes
    • How to Choose Among Indexing Options
    • Columnstores vs. B*Trees 

     

  • sp_whoisactive: The Big Fixes!

    Four and a half years have flown by since I released sp_whoisactive version 11.11.

    It's been a pretty solid and stable release, but a few bug reports and requests have trickled in. I've been thinking about sp_whoisactive v.Next -- a version that will take advantage of some newer SQL Server DMVs and maybe programmability features, but in the meantime I decided to clear out the backlog on the current version.

    And so, with that, I present sp_whoisactive version 11.16. Hosted on the brand new home for sp_whoisactive. (Yes, it's a bit minimalist for the moment.)

    Enjoy! 

  • Temporal Tables: Connect Item Round Up

    I've been thinking a lot about SQL Server 2016 temporal tables of late. I think it's possibly the most compelling feature in the release, with broad applications across a number of different use cases. However, just like any v.1 feature, it's not without its faults.

    I created a couple of new Connect items and decided to see what other things people had submitted. I combed the list and came up with a bunch of interesting items, all of which I think have great merit. Following is a summary of what I found. I hope you'll consider voting these items up and hopefully we can push Microsoft to improve the feature in forthcoming releases.

     

    Better, More Automatic Logging, Especially to Support Audit Scenarios

    A big theme I saw across many items, and something I've also heard from attendees of my sessions when I've presented on temporal tables, is the question of audit support. History is never just about the the data itself. Metadata around who made the change, how the change was made, and sometimes even why the change was made, can be vital for a number of scenarios. Temporal tables do not improve this experience today, but I think they very easily could.

    Permit additional hidden columns in temporal tables - by DBAIntrepid - https://connect.microsoft.com/SQLServer/Feedback/Details/1707429

    Storing audit columns and don't want your users to see them all the time? Wouldn't it be nice if they could be hidden by default? I certainly think so...


    Provide a mechanism for columns to automatically update themselves - by Adam Machanic - https://connect.microsoft.com/sql/Feedback/Details/3105516 

    UpdatedBy, UpdatedFrom, UpdatedSessionContext, and many other versions on the same theme. Every single database I see has these columns. Why can't SQL Server help us by automatically populating them on our behalf?

     

    Temporal Tables: Improve History Retention of Dropped Columns - by Adam Machanic - https://connect.microsoft.com/sql/Feedback/Details/3105517 

    One of the really annoying things about temporal tables is that the idea of what is and is not history can change. If I drop a column from my base table, the history of the column disappears. That, it seems to me, is unnecessary. Why not leave it (optionally?) and make it nullable?

     

    Temporal Tables (system versioned) Enhancment - by Guy Twena - https://connect.microsoft.com/SQLServer/Feedback/Details/1691517

    Not the best named item; this one is requests a column update mask, similar to that provided by CDC. I agree that this would be a very useful feature.

     

    Easier Querying 

    The first time you use SQL Server temporal tables, the query experience seems amazing. It's so much better and more integrated than any other form of "history" table we've ever had in SQL Server. But the experience very quickly breaks down as soon as you try to do something a bit more complex. The issue? The temporal predicates support only literals and variables. This means that, for example, you can't encapsulate a significantly complex temporal query in a view or inline UDF. I found lots of items around these theme but I decided to include only the following one in this post, as I thought it had the best phrasing and customer impact story.

    Temporal for FOR SYSTEM_TIME AS OF cannot use datetime functions. - by Eric A. Peterson - https://connect.microsoft.com/SQLServer/Feedback/Details/2582201 

     

    Better Administrative Support 

    Sometimes you need to change history. Maybe you need to trim some rows to keep table sizes in check. Maybe you need to backfill some history when merging data sets. Or maybe you need to fix an error. In any case, you can do this with temporal tables by briefly turning off the feature, making your changes, and then turning it back on. This isn't something end users should be doing, but is absolutely a valid administrative concern. Unfortunately, it's tougher than it should be to encapsulate in a stored procedure.

    Cannot catalog stored procedures for updating temporal columns in advance - by Greg Low - Australia - https://connect.microsoft.com/SQLServer/Feedback/Details/2500716

     

    That's it for this round up. Any other interesting items I missed? Please leave a note in the comments section if so! 

  • Upcoming Performance Seminars: Charlotte and Vegas!

    I am happy to report two more scheduled public deliveries of my new-for-2016 seminar, Tuning Your Biggest Queries! This seminar is all about how to approach difficult query tuning tasks by understanding exactly what the query optimizer is (and is not) doing with your data, how it internally works and thinks, and how you can take complete control. Thanks to great audience feedback I've made some tweaks and I think the content is getting better and better every time. 

    The upcoming dates are:

    Charlotte, NC - September 16 - Pre-conference seminar for SQL Saturday Charlotte. Note that this is almost sold out.

    Las Vegas, NV - October 10 - Pre-conference seminar for IT/Dev Connections. An early bird rate is still in effect for this one.

     

    Hope to see you there!

  • Performance Tuning Seminars in Philadelphia and Indianapolis

    Pretty much every environment I've ever seen follows the same patterns with regard to query size distribution: lots of typical run-of-the-mill queries (whatever is typical in that environment), followed by a diminishing number of larger queries (say, an order of magnitude bigger than typical), and a handful of gigantic queries that are several orders of magnitude beyond that.

    When tuning a slow SQL Server, what's the main thing I look at improving? I focus, first and foremost, on that final set, containing the biggest, most complex, most long-running things I can find. I believe this makes a lot of sense on several levels: Not only are the biggest queries the "lowest hanging fruit" -- i.e. things I tend to be able to markedly improve quickly, but they're also usually the ones dragging down the rest of the system. Bring down the monsters and all of the little guys can breathe easier.

    Unfortunately -- or fortunately if you enjoy these kinds of things as much as I do -- tuning the biggest and most complex queries can be quite a challenge. I often need to try to figure out what the query optimizer is "thinking," figure out which query hints I can or cannot leverage, and come up with interesting ways to re-write things to improve speed while maintaining the same logic. Dealing with big queries tends to be equal parts art (creativity) and science (raw application of logic). And the really interesting thing I've noted after tuning dozens of systems is that it doesn't matter whether my "big" query is bigger than your "big" query: It's the relative size of these beasts, as compared to whatever is standard, that makes them equally challenging and rewarding to tackle.

    Recently I was looking for a new topic on which to write a seminar, and I decided that this topic would be perfect. Query tuning is one of my favorite pastimes in the SQL Server world, virtually every environment can use some tuning, and virtually everyone has to deal with these kinds of big queries.

    So all that said, I'm proud to announce the first two deliveries of Tuning Your Biggest Queries, a full-day seminar I've put together for SQL Saturdays and conferences.

    The two dates are:


    Click through for abstract, pricing, and registration details. 
     
    If you can't make those dates, don't worry; more are forthcoming. Stay tuned. 

    Questions? Comments? Leave me a message below if you need clarification regarding the content, deliveries, or future dates.
     
    Hope to see you there!
More Posts Next page »

This Blog

Syndication

Privacy Statement