Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.
There I was. A freshly printed bachelor's degree in Computer Science tucked under my arm, I walked into my First Real Job. I'd never touched the technologies I was going to work with--ASP 3.0 and SQL Server--but my employer knew that, and I figured I'd be able to pick things up relatively quickly. After all, I'd been programming since the 2nd grade, knew a number of languages (more or less), and had all of the academic background I'd ever need.
I was given a quick tour of the office, oriented by HR, and met the rest of my group. As a member of the small IT team, my coworkers included an Exchange administrator and couple of help desk support/networking guys. No other developers. I was given a cubicle (shared with a finance person) and told to start creating the company's intranet. So, I did. And with no one to learn from or bounce ideas off of, I made up my own rules.
One of the first things I was asked to create was a threaded message system so that the remote marketing and sales teams could converse with one another. I got to work on a basic (and very ugly) user interface, and then tackled the database side of the equation.
I'd never encountered a real live database before, and scarcely even knew what one was, but I took a day or two to read "Teach Yourself SQL in 24 Hours" and figured I was good to go. Somehow I recognized the need for a self-referencing table. I created it, populated it with some test data, and quickly discovered that the simple test queries I'd written to date weren't going to work. How would I navigate the hierarchy and display the messages in a threaded fashion?
After messing around for quite some time and applying all of my Computer Science prowess, I came up with an algorithm:
- Using a cursor, start iterating the rows in the table.
- Insert "parent" rows into a temp table.
- For each "parent" row, recurse, calling the same stored procedure again and starting at step 1.
- Finally, return all collected results.
With its nested cursors and various work being done in temp tables, this algorithm scaled so very well that shortly after rolling the message board out to production I got to work on implementing a cache so that users wouldn't have to wait several seconds when making a web request.
Clearly, I'd done something very wrong. And I knew it. I just didn't know enough to know how to find the right answer.
Much later I learned the correct term for what I'd created--an adjacency list hierarchy--and I learned about other methods of modeling hierarchies, including materialized paths and nested sets. I learned that many of the lessons I'd been taught in school--where the curriculum was heavily biased toward procedural languages--didn't apply well to SQL databases.
And most importantly I learned how to ask (and answer!) questions on online forums. Being a team of one doesn't mean that you need to work in isolation. There is a huge community of people online who want to help you succeed. Finding these forums (and newsgroups. RIP, NNTP!) was a revelation. The ability to talk shop with people who understood what I was trying to do and how best to do it was invaluable to my learning how to be a better developer and not just a student of Computer Science.
Many years later and I still get stuck on difficult problems, but these days I don't try to do everything in isolation. I know better than that. I reach out to my network and take advantage of some of the great minds I'm lucky enough to have access to. And you can, too. Next time you find yourself with a less than ideal solution, swallow your pride and ask for help.
And don't forget to help someone else in return. As much as you'll learn from the people answering your questions, you'll get even more value from puzzling over the numerous problems that other people face on a day to day basis. Solving as many problems as you can--your own and those of others--is in my opinion the fastest way to truly master a given technology.
Enjoy your journey--and always remember that you don't have to go it alone.
“This ugly hack is only temporary,” you think. Six months later, a coworker curses your name, sacrificing a chicken to any deity that will help expedite your getting struck down by lightning, a school bus, or both.
Crap code. We’ve all seen it. We’ve all created it. We’re all guilty. Yes, even you. Sometimes our crap is purposeful—the ugly, “temporary” hack. Sometimes we produce crap because we simply don’t know any better. But there is no excuse good enough. As professionals, we must strive to rid ourselves of bad habits. And the only way to learn the difference is to see lots, and lots, and lots of examples.
That’s where you come in. This month’s T-SQL Tuesday challenge: reveal your crap to the world. Why is (or was) it crap? Why did you do it? And how did you learn from your mistake?
I’m posting a day late, and the subject matter may require a bit of thought, so I’m slightly changing the rules—temporarily, of course.
Here’s what you need to do.
- Write a blog post talking about your crap. This doesn’t mean that your post should be crap. Two wrongs, in this case, do not make a right!
- Your post must go live some time between 00:00:00.00 GMT and 23:59:59.99 GMT on Wednesday August 10, 2011
- Your post must contain the T-SQL
TuesdayWednesday logo from above and the image should link back to this blog post.
- Make sure that a link appears in the comments section below, either via a trackback or a manual comment. I’ll take all of the posts that are reported and write a round-up.
- Include a reference to T-SQL Tuesday (or Wednesday) in the title of your post
- Tweet about your post using the hash tag #TSQL2sDay
- Consider hosting one of these. Ask me how.
Enjoy! Here’s hoping for a truly cathartic blog event.
A couple of weeks ago I announced a two-day advanced performance seminar in New York City, which will be delivered in July. This seminar will cover SQLCLR and parallelism techniques to help you take performance well beyond the levels that typical tuning exercises yield. Check out the links for more details, including a full outline.
Thanks to the great response so far, we have decided to extend the early registration discount for a few more days. You have until the end of the day tomorrow, June 3, to take advantage of the $100 savings off of the $1050 course fee. To get the discount, use the code EARLYBIRD on the registration page.
Finally, I would like to point out that one of the main reasons that I chose New York City for this course is that it is by far my favorite city to visit. I just noticed yesterday that a bunch of hotels are offering a free third night this summer--so I hope that you'll be able to come for the course and then stay an extra night or two to enjoy the city. Note that we also have a hotel discount available to course attendees, unrelated to the third night offer. Let me know if you would like more information on that.
Hope to see you in New York next month!
I am pleased to announce that I will be delivering two days of training in New York City, July 14 and 15.
This seminar focuses on achieving "next-level" performance--going beyond that which you can gain via normal tuning methodologies. The vehicles for this performance improvement are two technologies that I've been pushing on this blog and in other venues for a long time: SQLCLR and parallelism. The seminar will be based on the in-depth materials that I used for my full-day sessions PASS conference in 2009 and 2010. These seminars were both quite well-received, but I have tweaked and tuned the content to make it even better and more focused on the bottom line goal of achieving maximum performance.
Full information on the seminar is available on the Data Education web site. There is also an early bird discount currently in effect. Use the discount code "EARLYBIRD" to save $100 on the $1050 registration fee.
I would like to take this opportunity to mention that Data Education is a new training venture that I've recently launched. This will be the company's second public training event (our first featured Kalen Delaney in the Boston area). The company is an evolution of Boston SQL Training, a company that I started a couple of years ago with the goal of bringing extremely high-quality SQL Server training events to the Boston area. The new name, Data Education, reflects our desire to focus beyond Boston and on a broader technology spectrum. We plan to eventually move into training on data-related programming (Entity Framework and similar), other DBMS platforms, NoSQL technologies, and wherever else the database industry moves.
Currently, aside from my course in New York we've announced an Analysis Services and PowerPivot course featuring Teo Lachev, which will take place in the Boston area September 19-23. Several other courses will be announced shortly, so stay tuned and consider following us on Twitter (@DataEducation).
If you've read this far, I would greatly appreciate your taking part in a quick and informal poll: in the comments section below, please let me know what geographic location would be interesting to you for an advanced SQL Server course, and what topic areas you're not seeing enough of.
Thanks, everyone, and I'm looking forward to seeing you in New York!
Thanks to everyone who took the time out of their conference experience to join Mike Wachal and me for yesterday's session on SQL Server performance tuning!
For those who weren't there, we focused in on troubleshooting techniques, highlighting some of the key DMVs and new Extended Events features that will help with proactive diagnosis of problems. My section, in particular, was a bit of a taste of some of the sessions I've submitted for this fall's PASS summit: I showed a brief demo of troubleshooting using my No More Guessing methodology, and applied it to the problem of workspace memory contention. The sessions for PASS cover these topics in detail, so if you find this area interesting please vote here.
The demo script I showed yesterday is attached, along with a copy of the latest versions of my Who is Active stored procedure and the SQLQueryStress tool. Feel free to leave a comment below or drop me a line via e-mail if you have any questions.
This post is part 30 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Pop quiz: What happens when you promise to write 30 posts on a topic, but you don’t plan properly and lose a bit of steam toward the end?
Answer: A recap post, a day or two early! Woo hoo!
The month started with a few introductory posts. A history of monitoring (from my point of view), some information on Who is Active’s design philosophy, and a few words on the Who is Active license.
The key takeaways from these first posts? The monitoring story has been pretty poor for most of SQL Server’s history. It’s better now, but it’s still a lot of work. Who is Active was created to make things more accessible. It’s free, with a few restrictions designed to keep people who make money from monitoring from making money on Who is Active. Anyone else can and should use and enjoy.
Next I got new users up to speed. My post on installation covered security considerations, followed by a post on why you may not see as much data as you're used to when you first start using Who is Active. I continued with a post on the various options supported by the procedure (plus a nod to the online help mode) and another post describing the default output columns.
One of the more important posts of the month described how Who is Active treats active requests and sleeping sessions. This can be a confusing area for new users, since it's not always obvious what's going on unless you're looking in the right place at the output (the [status] column).
Once the refresher was finished I began covering some of the basic options exposed by the procedure. Filters were one of the first things I implemented, so it was a logical place to start (the "not" filters came much later). Seeing the query text is a key part of the procedure's functionality, and that was next on my list. If you're not sure what the [sql_text] column means, this post will set the record straight.
The ability to see the query text is nice, but so is the ability to get a query plan--and of course Who is Active supports that too. And since everything a query does is transactional, the procedure allows users to collect information about what those transactions are up to.
The next section of the month was all about query processing. I started with a couple of background posts: One on how the query processor works (at a somewhat high level), and another on what blocking really means.
Who is Active exposes two different modes that help with analysis of real-time waits. My 15th post of the month covered the default--lightweight--collection mode. My followup post covered the more extensive full waits collection mode.
Sometimes a query plan and wait information isn't quite enough to diagnose a performance issue. What if the query plan usually works well, but isn't performing properly only in this specific case? Who is Active has a feature to help you figure that out. And what if you need a bit more information on some of the settings that the request is using?
Mining wait information yields some amazing returns. One of the additional pieces of information that you can get is the actual name of the object that's causing a block. Another thing you can see is the exact node within a query plan where a task is currently doing work. You can (and should!) also use waits to figure out whether tempdb is properly configured. There are still more tricks you can play with waits, but they'll have to wait for a future post.
Like wait information, lock data can also be mined. However, there is so much locks information that the real challenge is rendering it in a human-readable manner. Who is Active does that using a special custom XML format. The procedure also helps with another type of blocker analysis, in the form of finding the "lead blocker."
Once you've figured out what information you're interested in, why not set up the output so that you can see the information the way you want to see it? Who is Active helps with this by allowing users to dynamically customize output columns and sort order in a few different ways. And while you could always run Who is Active interactively, that might get a bit dull. Automated data collection is a much nicer methodology in many cases.
Most of the numbers shown by Who is Active are cumulative. But oftentimes it's more interesting to compare data between two snapshots. The procedure can help you do that, using its delta mode.
What fun would a month of Who is Active be without an official release? v11.00 has numerous new features, several of which were perfected this month thanks to feedback I received on the various posts in the series. It's great to have such enthusiastic users! Some of these new features didn't make it into prior posts, and other existing features are a bit hidden. So I did a writeup covering the more important things that you may not have noticed while working with Who is Active.
I finished the month with a discussion on security for slightly tougher situations. I hope that the module signing technique will allow Who is Active to be used in a number of places where security auditing requirements have made things difficult.
And that's that. A month of activity monitoring with Who is Active. Thank you for reading! Next, this text is going to be edited, expanded in places, and put into a much more comprehensive form. Watch your RSS reader for more information.
This post is part 29 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Early in the month I discussed basic security requirements for Who is Active. I mentioned the VIEW SERVER STATE permission and the fact that I consider it to be “a relatively low barrier to entry.”
But what if, in your organization, it’s not? Auditing requirements being what they are, you might be required to lock things down. And granting someone full and unrestricted VIEW SERVER STATE may simply not be an option.
Enter module signing. By securing Who is Active (or any other securable, for that matter) via inherited permissions, it’s often possible to get around auditing requirements, as long as the module itself has been reviewed. This is not at all a difficult thing to do, but in my experience most DBAs haven’t played much with signed modules. Today I’ll show you how quick and easy it can be to set things up.
Start by creating a certificate.
CREATE CERTIFICATE WhoIsActive_Permissions
ENCRYPTION BY PASSWORD = '1bigHUGEpwd4WhoIsActive!'
WITH SUBJECT = 'Who is Active',
EXPIRY_DATE = '9999-12-31'
Once you have a certificate in place, you can create a login from the certificate. The goal is to grant permissions, and to do that you need a principal with which to work; a certificate does not count. A login based on the certificate uses the certificate’s cryptographic thumbprint as its identifier. These logins are sometimes referred to as “loginless logins,” but I refer to them as “proxy logins” since that’s what they’re used for: proxies for the sake of granting permissions.
CREATE LOGIN WhoIsActive_Login
FROM CERTIFICATE WhoIsActive_Permissions
The login can be granted any permission that can be granted to a normal login. For example, VIEW SERVER STATE:
GRANT VIEW SERVER STATE
Once the permission has been granted, the certificate can be used to sign the module—in this case, Who is Active. When the procedure is executed, a check will be made to find associated signatures. The thumbprint of the certificates and/or keys used to sign the module will be checked for associated logins, and any permissions granted to the logins will be available within the scope of the module—meaning that the caller will temporarily gain access.
ADD SIGNATURE TO sp_WhoIsActive
BY CERTIFICATE WhoIsActive_Permissions
WITH PASSWORD = '1bigHUGEpwd4WhoIsActive!'
Getting to this step will be enough to allow anyone with EXECUTE permission on Who is Active to exercise most of its functionality. There are a couple of notes and caveats: First of all, every time you ALTER the procedure (such as when upgrading to a new version), the signature will be dropped and the procedure will have to be re-signed. You won’t have to create the certificate or the login again; you’ll just have to re-run that final statement. Second, you’ll only be able to use most of the functionality. Certain features, such as blocked object resolution mode, won’t operate properly, depending on whether the caller has access to the database in which the block is occurring. This may or may not be a problem—it depends on your environment and what users need to see—and Who is Active itself won’t throw an exception. An error message will be returned somewhere in the results, depending on what the user has tried to do.
If you would like to grant database-level permissions based on the certificate login so as to avoid these errors, that’s doable to. Just do something like:
CREATE USER WhoIsActive_User
FOR LOGIN WhoIsActive_Login
This will allow Who is Active to figure out what the various blocked or locked object names are. Since the login is just a proxy no one can actually log in and get direct access to read the data, so this isn’t something I consider to be a security risk. However, keep in mind that if anyone has the password for the certificate and sufficient privileges in master, a new module could be created and signed. Keep the password secure, and make sure to carefully audit to catch any infractions before they become a risk.
Security policy should never be a reason to limit your monitoring choices. Module signing is a powerful tool for Who is Active in addition to many other applications. I highly recommend studying it in detail in order to enhance your ability to provide high-quality, totally flexible, and completely secure solutions.
This post is part 28 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Over the past month this series has covered a lot of ground. After writing almost 30 blog posts on the stored procedure it’s interesting to look back and see the fact that it’s gotten much bigger and more complex than I realized. Kind of like watching a plant grow, you don’t notice the day-to-day changes until one day you go to water it and see that it’s taken over your entire garden.
A few things that I’ve added along the way have been especially useful, but there hasn’t been a good place to mention them thus far in the series. With just three posts left for the month, now is the time.
Always Show Blockers
In a post earlier this month I talked about filters. Filters let you decide exactly what you want to see, and what you not don’t want to see (“not” filters). But sometimes you have no choice: if you’re filtering so that you only see session 123, and it’s being blocked by session 456, you’ll also see information about session 456. The idea is that you should always get enough context to fully evaluate the problem at hand. Even if it means that you see more information than you were intending to see the first time around. You probably would have asked for 456 next anyway.
In @get_task_info = 2 mode, you may see waits called “RUNNABLE.” This could strike you as an oddity, given that there is no such wait type in SQL Server. I wanted to show tasks on the runnable queue, and making up a fake wait type seemed like a reasonable way of accomplishing the task. In practice, it has worked extremely well—I’ve used this feature countless times to help understand scheduler contention on a SQL Server instance.
Accurate CPU Time Deltas
CPU time is a tricky metric. It gets handled by Who is Active’s delta mode, and has for several versions. But historically, never very well. The data simply isn’t represented in an easily-obtainable fashion in the core DMVs. Recently I decided to dig deeper into this problem and discovered that I could get better numbers from some of the thread-specific DMVs. They’re cumulative numbers, based on the lifetime of the entire thread—not too good for the usual Who is Active output. But for snapshot and delta purposes, just about perfect. Meaning that in v11.00 of Who is Active, you can see the [CPU] column show a value of 0, while the [CPU_delta] column shows a value in the thousands. It’s not a bug. It’s a feature. (It really is!)
Text Query Plans and the XML Demon
SQL Server 2005 introduced query plans as XML. Management Studio knew how to render these plans graphically. And we were able to pull the plans from DMVs. Life was great. Except, perhaps, when you actually wanted to view one of these plans, and you had to save the thing out to a .SQLPLAN file, close the file, then re-open it. That’s about three steps too many for my taste, so I was overjoyed when the Management Studio team decided to wire things up the right way in SQL Server 2008. Click on a showplan XML document, see a graphical plan. Simple as that.
Unfortunately, the XML data type has its own issues, including one particularly nasty arbitrary limitation that has to do with nesting depth. The idea is to make sure that XML indexes don’t crash and burn too often (not a big concern for me, given that I’ve never seen one used in a production environment—but I digress). The problem is that query plans are heavily nested. And to get that nice graphical plan workflow, SSMS needs the plan rendered as XML.
In prior versions of Who is Active I gave up and returned either an error or a NULL. But in v11.00 I decided to make things a bit better. If the nesting issue occurs, Who is Active will now return the plan as XML encapsulated in some other XML in a text format, along with instructions on how to view the plan. This won’t give you a nice one-click experience, but it will give you the ability to use Who is Active to see some of the bigger plans that are causing performance issues.
Service Broker Needs Love Too
One of the most interesting features of Service Broker is activation. But a vexing design choice on behalf of the Service Broker team was to make activation procedures launch as system sessions. This means, among other things, that prior versions of Who is Active filtered them right out of the default view. To see them you’d have to enable @show_system_spids. And then you’d have to ignore all of the other system stuff. And you’d get woefully bad time information (no, the activation process hasn’t been running for 25 days; that’s the last time you restarted the SQL Server instance). In Who is Active v11.00 this has been fixed. Service Broker activation processes are now displayed by default along with other user activity. And I found a way to fix the timing issue, thanks to some advice on Twitter from Remus Rusanu, one of the guys who originally worked on Service Broker. So if you’re using activation and monitoring with Who is Active, life is good.
The homework section of this series is officially closed. Your assignment is to go enjoy the rest of your day.
This post is part 27 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Thanks to your feedback over the past month, I’ve managed to get a lot of work done on the next version of Who is Active.
So much work, in fact, that I’ve finished a new major release.
Click here to download Who is Active v11.00
There are numerous enhancements in this release. In no particular order:
- CPU deltas use-time thread-based metrics for more accurate data (use both @delta_interval and @get_task_info = 2)
- command_type information added to [additional_info] column for active requests
- Modified elapsed time logic to retrieve more accurate timing information for active system SPIDs
- SQL Agent job info (job name and step name) is now included in the additional_info column (use @get_additional_info = 1)
- If there is a lock wait, information about the blocked object (name, schema name, and ID) is now included in the additional_info column (use both @get_additional_info = 1 and @get_task_info = 2)
- Service Broker activated tasks are now shown by default, without using @show_system_spids mode. The program_name column contains the queue_id and database_id associated with the activation procedure
- Various numeric columns, including reads, writes, cpu, etc, have been made nullable. These will occasionally return NULL, on extremely active systems where the DMVs return data more slowly than queries start and complete
- Query plans that cannot be rendered due to XML type limitations are now returned in an encapsulated format, with instructions, rather than sending back an error
- Added wait information for OLEDB/linked server waits
- Wait collection will now "downgrade" to get_task_info = 1 style data if no other information is available in get_task_info = 2 mode
- Added header information to online help
- Added a login_time column to the output
- The duration for sleeping sessions is now the sleep time, rather than the time since login
- Fixed various bugs
I’ve written about several of these things over the past month, and there are a couple of posts left for the remainder of the month, so I won't elaborate here.
Thank you, thank you, and thank you again to everyone who has taken the time to give me feedback and/or report bugs! Who is Active would not be what it is without you!
I would especially like to thank Paul White. Paul has acted as my unofficial "lead QA engineer" for the past few versions. He's found some very interesting bugs in my code, in addition to pointing out some quirks in the DMVs that I wasn’t handling properly. Thanks, Paul, for the great job you’ve done!
One other thing I would like to point out is an addition to the header: a donation link. Several people recently have asked me how to donate, and I didn’t have a good answer. Now, I do.
Use this link to support the Who is Active project!
While I feel strange asking for money, the truth is that I’ve invested well over a thousand hours in the stored procedure, documentation, and support (I respond to EVERY e-mail I receive from Who is Active users). If you’ve been able to improve your day-to-day performance thanks to Who is Active, it would be great if you—or even better, your employer—could help support my work on the project. I’ve listed a recommended donation of $40 per installed instance, but please donate however much you feel Who is Active is worth to you and/or your organization.
Thanks again, and enjoy!
This post is part 26 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
The CPU is pegged. Your hard disks are fried. Memory consumption is through the roof. Who is to blame?!?
Well that’s easy. Blame the session with the most CPU consumption. Blame the session doing the most I/O operations. Ask Who is Active what’s going on—it’ll tell you...
Clearly session 52 is the problem here. It’s consumed over 2,000,000 milliseconds of CPU time. Session 53, on the other hand, has consumed a paltry five seconds. No brainer, right?
Except that it’s not.
Session 52? It’s in a wait state. (WAITFOR, to be exact.) Not consuming any CPU time at all at the time I ran Who is Active. Session 53? Using plenty. So why the discrepancy? Does Who is Active have a major bug?
The fact is—and this will be obvious to a lot of you, so forgive me—most of the metrics reported by Who is Active are cumulative. They’re totals from the entire session, or the entire lifetime of the request (depending on what’s going on). In the case of session 52 in the image above, that much CPU time was consumed over a five-minute period prior to the point in time when I ran Who is Active. Session 53, on the other hand, is currently consuming CPU time. But how do I figure that out?
The answer: delta mode. This feature is something for which you can thank Jimmy May, who kept telling me about his “SQL Deltoids” script that he’d written back in the bad old days of SQL Server 2000. “If only there were a way to apply that script to the SQL Server 2005 DMVs,” he lamented one day. So I added the functionality into Who is Active. It’s quite simple, and surprisingly effective.
To use delta mode, simply invoke Who is Active’s @delta_interval option. Setting an interval will cause the procedure to execute its main logic branch, then wait the specified amount of time—a number of seconds between 1 and 255—before running the same main logic branch again. All of the numeric values that changed during that period will be compared, and the changed numbers will be output in a series of columns named with _delta at the end.
For example, here are the same two sessions as above, viewed in delta mode:
@delta_interval = 5
During the five-second delta interval, session 52 consumed no CPU time. Session 53, on the other hand, consumed over 5,000 milliseconds of time. If the CPU is pegged, 52 is no longer contributing—53 is the session of interest.
Enabling delta mode will add some or all of the following columns to the output, depending on what other options have been set:
By leveraging these delta columns in conjunction with the @sort_order option, it’s easy to quickly see which sessions are currently consuming your valuable server resources—which is generally more interesting than seeing which sessions may have been consuming resources but are now waiting or sleeping.
Note: The various sessions and requests you'll see in delta mode pertain to the information captured after the wait interval. Information about requests that were running as of the first collection but are not running as of the second is discarded. New requests that started after the first collection will be shown, but with delta values of NULL. Locks, transactions, and other optional information, will also be captured only for the second collection, since no delta calculation is possible for those metrics (at least, not yet).
Delta mode, in its current form, is only really useful if you have requests that last longer than a second (at minimum—I usually do five or ten-second intervals). What kind of workload do you see on your servers? Is delta mode as it exists today something that you’re making use of (or that you’ll start making use of, now that I’ve written a post about it)?
This post is part 25 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Yesterday’s post was about configuring the output based on what you want to see. Today’s post is about taking that perfect output and persisting it.
There are many reasons that you might like to store the results of a Who is Active call to a table. Some of the real use cases that I’ve been told about by Who is Active users include:
- Scheduled monitoring. Capturing the results of Who is Active calls in 5 or 10 minute intervals to see what’s happening on the database server throughout the day (or night)
- Using it as part of a build process, to verify that the correct things are happening in the correct order at the correct times
- Using it as part of an exception handling system that automatically calls Who is Active every time an error occurs, to snapshot the current state of the database instance
And there are various other use cases in addition to these. The point is that there are a number of reasons that you might want to capture the output.
Unfortunately, it’s not as simple a task as you might think. The first time I tried to make it work, I did something like:
CREATE TABLE #output
This approach failed miserably. If you try it, as I did, you’ll see the following error message:
Msg 8164, Level 16, State 1, Procedure sp_WhoIsActive, Line 3086
An INSERT EXEC statement cannot be nested.
Who is Active uses a number of INSERT EXEC statements, and they cannot be easily changed or removed, so for a while it seemed like all was lost. After a bit of brainstorming, however, I realized that I could simply build yet another INSERT EXEC into Who is Active—one that will insert into a table of your choice.
Of course, first you need a table. And if you’ve been reading this series you’re no doubt aware that the output shape returned by Who is Active is extremely dynamic in nature, and depends on which parameters are being used. So the first option I added was a method by which you can get the output schema. Two parameters are involved: If @return_schema is set to 1, the schema shape will be returned in an OUTPUT parameter called @schema. This is best shown by way of example:
DECLARE @s VARCHAR(MAX)
@output_column_list = '[temp%]',
@return_schema = 1,
@schema = @s OUTPUT
The idea is that you set up your Who is Active call with all of the options you’d like, then bolt on the @return_schema and @schema parameters. Here the column list is being restricted to only those columns having to do with tempdb. If you run this code, the SELECT will return the following result:
CREATE TABLE <table_name> ( [tempdb_allocations] varchar(30) NULL,[tempdb_current] varchar(30) NULL)
This result can be modified by replacing the “<table_name>” placeholder with the name of the table you actually want to persist the results to. Of course this can be done either manually or automatically—after the call to Who is Active, the text is sitting in a variable, so a simple call to REPLACE is all that’s needed. That call could even be followed up by a call to execute the result and create the table...
DECLARE @s VARCHAR(MAX)
@output_column_list = '[temp%]',
@return_schema = 1,
@schema = @s OUTPUT
SET @s = REPLACE(@s, '<table_name>', 'tempdb.dbo.monitoring_output')
Of course now you probably want to put something into the table. Crazy! To do this, drop the @return_schema and @schema parameters and replace them with @destination_table—the name of the table into which the results should be inserted:
@output_column_list = '[temp%]',
@destination_table = 'tempdb.dbo.monitoring_output'
Now the results of the call will be inserted into the destination table. Just remember that every time you change the Who is Active options, you’ll have to re-acquire the output shape. Even a small change, such as adding an additional column to the output list, will result in a catastrophic error.
@output_column_list = '[session_id][temp%]',
@destination_table = 'tempdb.dbo.monitoring_output'
Msg 213, Level 16, State 1, Line 1
Column name or number of supplied values does not match table definition.
How far you take this feature depends on how creative you are. Some of you have come up with elaborate schemes, but I generally keep it simple. Something that I like to do is to set up a short semi-automated process by using Management Studio’s GO [N] option. I use this when I’m doing intense debugging, and will do something like:
DECLARE @s VARCHAR(MAX)
@format_output = 0,
@return_schema = 1,
@schema = @s OUTPUT
SET @s = REPLACE(@s, '<table_name>', 'tempdb.dbo.quick_debug')
@format_output = 0,
@destination_table = 'tempdb.dbo.quick_debug'
WAITFOR DELAY '00:00:05'
This will first create a table in tempdb, after which it will collect the results every five seconds for a five-minute period. I set @format_output to 0 in order to get rid of the text formatting so that I can more easily work with the numeric data. The results can be correlated to performance counters or other external information using the [collection_time] column, which was added to Who is Active specifically to support automated data collection.
Share your experiences with Who is Active’s data collection feature. How are you using it? Have you hit any problems or roadblocks? Any awesome success stories? A few sentences will be great, and more is fine if you’re in the mood to tell a story.
This post is part 24 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Early in the process of creating Who is Active it became clear that there was no way everyone was going to agree.
With each release I received requests to move some column so that it would show up on the lefthand side, or to change the sort order, or to make some other modification that someone felt was necessary to help them more easily digest the data. The problem: it was impossible to accommodate all of these requests. So I decided to go with a self-service model.
In order to allow users to dynamically customize the output (without touching any code), the stored procedure exposes three options: @output_column_list, @sort_order, and @format_output. Each of these is discussed in the following sections.
@output_column_list controls not only whether or not certain columns will be shown in the output, but also the order in which the columns are displayed. The correct argument is a list of bracket-delimited column names (or partial names with wildcards). Delimiters are not necessary (use whatever delimiter you like, or none at all; they’re ignored). The key to successfully using this option is to remember that inclusion of columns in the output is additive: many columns (such as [additional_info]) are only added to the output if both the correct options are enabled for the stored procedure and the columns are included in the column list. If you start modifying the list and don’t take this into account, you may not see the columns you’re expecting when you go back later and start changing options. To keep things flexible, make sure to use wildcard columns, especially a generic wildcard ([%]) at the end.
Using the column list can be as simple as specifying the exact columns you’re interested in:
@output_column_list = '[tempdb_allocations][tempdb_current]'
Notice that no delimiter is used here. Again, any extraneous text aside from the column specifiers is ignored, so the following call is equivalent:
@output_column_list = 'this[tempdb_allocations]is[tempdb_current]ignored'
Easier than specifying exact column names is to use wildcards that match the pattern of the columns you’re interested in:
@output_column_list = '[tempdb%]'
All three of these calls will yield the same output, similar to the following:
Of course, this will return only these two columns. Generally I’ll use the column list feature just to move things around so that I don’t have to do as much scrolling, and in those cases I almost always want everything else, too. That’s where the generic wildcard comes into play:
@output_column_list = '[tempdb%][%]'
Now the tempdb-specific columns appear on the lefthand side, and all of the other columns follow, in a default order.
The next option we’ll cover is @sort_order. This option controls the order of the rows output by the stored procedure. Like @output_column_list, the input is a list of bracket-delimited column names. In this case, wildcards are not supported, but the keywords ASC[ENDING] and DESC[ENDING] are supported after the column name.
The following call returns data sorted by [login_name] ascending (ascending is default, so the keyword is optional), with ties broken by [session_id] descending:
@sort_order = '[login_name][session_id] DESC'
As before, delimiters are optional and are ignored. Please note that the current betas of Who is Active have bug where multi-column sorts like this one are not properly honored. That issue will be fixed in the next version of Who is Active.
Sorting is especially useful when doing any kind of comparison of the various requests currently running. I use it extensively in block leader mode ([blocked_session_count] DESC), and it is quite handy in delta mode—which will be covered in a post in a few days.
The final option this post will cover is @format_output. This one is based on the fact that Who is Active is designed to leverage SSMS as a “graphical user interface” of sorts. Query text is sent back in an XML format, numbers are formatted as right-justified strings, and elapsed time is formatted in the more easily-digestible [dd hh:mm:ss.mss] form.
If you’ve been using Who is Active for a while you’ve probably noticed the right-justified numbers:
This format was suggested fairly early on by Aaron Bertrand, and is one of my favorite things about the stored procedure. It makes it much easier to pick out bigger numbers when you’re looking at a large set of data. By default, SSMS uses a non-fixed width font for grid results, so the default argument to @format_output, 1, takes this into consideration. But some people—like Aaron Bertrand—change the SSMS settings and use a fixed width font instead. If you’re one of these people you can use an argument of 2 to get nicely-formatted numbers. If you don’t change the argument you might notice that the numbers don’t seem to properly line up when you’re working with a set of data containing numbers of greatly differing size.
Still other people don’t like the right-justified numbers or are doing collection to a table (see tomorrow's post), and so formatting can be completely disabled by using an argument of 0. It’s up to you...
Today’s homework is to enjoy your Sunday. You’re a busy DBA; you’ve earned a break. Get outside and off of the computer!
This post is part 23 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Oftentimes blocking situations are a bit more complex than one session blocking some other request. In busy systems blocking chains can build up that include dozens or even hundreds of requests, all waiting on one another. And all are waiting as a result of one top-level blocker: the block leader. In many cases fixing the blocking problem means fixing whatever it is that the block leader is doing, so identifying it quickly is a key part of debugging blocking situations.
Finding the block leader is a fairly simple process once you realize that blocking is effectively a hierarchy. The process involves starting with all of the requests that aren’t blocked and walking down the blocking chain until the leaf level is reached—blockees that lack any more downstream blockees. At each level, a number is recorded to figure out the blocking depth. As an added benefit, a second pass can be made to reverse the number at the end of the process—this shows the total number of downstream blockees for each blocker.
While this is relatively easy to implement using a Common Table Expression, it’s certainly not something that users should have to reinvent each time it’s needed. So Who is Active exposes an option, @find_block_leaders, that does the work for you. This option adds a new column to the output, called [blocked_session_count], which reflects the total blockee count. Higher numbers mean more sessions blocked downstream; the sessions with the highest numbers are your block leaders, and these are the ones you want to focus on.
To see this in a bit more detail, run the following batch in four sessions:
UPDATE TOP(10) Sales.SalesOrderDetail
SET OrderQty += 7;
Assuming that nothing else is running, the first session will complete. The other three will block. Who is Active reports this, of course, even in its default mode:
The initial update was run on session 53, which is blocking session 54. Both 55 and 56 are reported as blocked by 54, although in reality they’re being blocked indirectly by 53. This case, while more complex than most of the blocking examples used in this series, is still simpler than many of the things seen on average production systems. None the less, it’s enough to show the power of the Who is Active option that this post is about...
@find_block_leaders = 1
The [blocked_session_count] clearly shows which session is causing the issue in this case: 53 has three downstream blockees, whereas 54 has only two, and the other two sessions have none.
In this case, because I happened to run the batches in the correct order, the data is returned by default with the block leader sorted on top. That may or may not always be the case in a real system, but it’s quite a useful thing when there are numerous active requests and you want the most important ones right at the top. To accomplish that, use the output ordering feature—which will be covered in detail in tomorrow’s post.
@find_block_leaders = 1,
@sort_order = '[blocked_session_count] DESC'
Today’s “homework” is a question for you to consider about the behavior of Who is Active with regard to sleeping sessions: Today the [start_time] and [dd hh:mm:ss.mss] columns, for sleeping sessions, correspond to the login time for the session. But recently I’ve been thinking that it might make more sense to instead show the amount of time since the last request completed—the amount of time that the session has been sleeping. This seems to me to be more useful information and more in line with the goal of the stored procedure.
Which would you rather see, and why? I would appreciate any input and will carefully consider it. This would be a fairly major change, and it is an important decision either way.
This post is part 22 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Note: Before reading this post, please download the most recent Who is Active build, which includes a key fix for the locks mode.
Blocking has been a theme of a couple of recent posts in this series. And that’s not even the end of it. Helping you find and properly evaluate blocking issues is a core part of Who is Active’s raison d'être.
Behind every block is something causing the block. Something that, without which, there could be no block. And that thing is called a lock.
Locks are synchronization objects. Their mission in life is not to give you headaches when dealing with blocking issues, but rather to help maintain the ACID properties that are a big part of the reason that DBMS technology is so popular. For locks in particular this means the “I” property: Isolation. (And, to a lesser extent, the "C" property: Consistency.) Locks keep readers from reading data that writers haven’t finished writing, and they keep writers from overwriting data as it’s being read or written by someone else. This is a good thing. Failure to take these kinds of precautions would result in chaos. And a lot of really bad data.
When a DBA sees blocking, her first instinct is to eliminate it. Kill the blocker! Add a NOLOCK hint! Change the processes around! But blocking is not necessarily a bad thing. Blocking means that your data is being protected. Readers are getting consistent results. Writers aren’t overwriting each other. Everything is as it should be—even if your queries are waiting an inordinate amount of time for data.
When you see blocking, the correct move is not to eliminate it, but rather to evaluate it. Figure out what’s causing the blocking. Figure out why (or whether) it’s necessary, and what the alternatives might be. Then—and only then—should you start killing sessions, adding hints, or taking similar action.
Evaluating blocking can be a painful experience. The sys.dm_tran_locks view (formerly syslockinfo) contains a large number of columns. Many of these are numeric values that need to be referenced elsewhere in order to be meaningful to the average human. And even then, it’s simply not a very nice user experience...
Glancing at this list of lock information, it’s impossible to tell what’s going on. (The query I ran to pull up this list returned 2700 rows.)
Who is Active solves this problem by putting locks into a somewhat more human-readable form: a custom XML format. The stored procedure does all of the work of going to the databases with locks and decoding the numbers. So instead of seeing something like 72057594038845440, you’ll see something like Sales.SalesOrderHeader. Whether or not you think that XML in general is a very readable format, the fact that the various object names have been resolved for you makes it a lot better than a straight query against sys.dm_tran_locks.
To get lock information, use Who is Active’s @get_locks = 1 option. This will add a column called [locks] to the output. The column is typed as XML, and you can click on it to see the full contents. The document will have one root node per database in which there are locks. For the table listed above, the collapsed nodes look like this:
Under each database node is one node that represents locks on the database itself, and a node called Objects that contains subnodes for each object in the database that’s locked. These are grouped by object name and schema name:
Any given object can have multiple types of active locks issued against it at one time, so inside of the Object nodes are one or more Lock nodes:
This format allows for quick and simple exploration of the various locks that are active on behalf of your session. Each Lock node has an attribute called request_status. If its value is “GRANT,” the lock is held by the session. If its value is “WAIT,” the request is waiting to acquire the lock.
A full description of the various lock types is well beyond the scope of this post, but most of them are documented in the Books Online entry for sys.dm_tran_locks.
A cautionary note: Using the @get_locks option can seriously slow down Who is Active. The sys.dm_tran_locks DMV is known to be one of the slowest DMVs, and in some cases it can hold a huge number of rows. I have seen numerous cases where a simple SELECT * against the DMV took 20 or more minutes to finish. When dealing with locks, which can change rapidly, that’s far too much elapsed time for the results to be meaningful. Recent Who is Active builds include blocked object resolution mode, which is designed to be a much lighter weight alternative to using the full locks mode.
Using the various DMVs it’s possible to write a number of queries that can deeply analyze block situations. Although Who is Active already does this, it’s an interesting exercise. Can you write a query that shows all blocked requests, the lock mode for each request, the blocker session or request that the blockees are waiting on, and the blocker lock mode?
This post is part 21 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.
Tempdb. Everyone’s favorite shared bottleneck.
The funny thing about tempdb is that it’s not used by every query. It’s only really used by the biggest queries. The queries where performance really matters. And of course, that makes the situation all the worse. When tempdb is a problem, it’s a major problem.
A common cause of tempdb issues is latch contention. This occurs due to three “special” page types: GAM (Global Allocation Map), SGAM (Shared Global Allocation Map), and PFS (Page Free Space). Each of these pages (sometimes many of each) exist in all database files, and they are responsible for helping to identify where incoming data can be written in the physical file. For complete background, read this post by Paul Randal.
Whenever a process needs to update one of these special pages, a latch is taken. A latch can be thought of as a kind of very lightweight lock; it’s designed for synchronization and is intended to be quickly turned on, and then just as quickly turned off after it’s no longer needed. The problem occurs when lots of processes happen to hit the SQL Server instance at the exact same time, and they all need to find a place to store some data. Suddenly, there are numerous outstanding latch requests. They start queuing up, and before you know it the last one in line has been waiting for seconds—and not doing any work at all in the meantime.
Seeing this in action is quite simple. Create a bunch of temporary tables, in a very short period of time. I like to demo this using SQLQueryStress. Ideally you should do this kind of test on a server with only a single tempdb data file, to really highlight the issue. Here’s the code to run:
20 or so concurrent threads should do it. Hit the start button in your load tool, wait a few seconds, run Who is Active, and you’ll see something similar to:
See those PAGELATCH waits? They’re all on the same resource: PFS pages, in tempdb file ID 1. The format for PAGELATCH and PAGEIOLATCH waits is: [wait_type]:[database_name]:[file_id](page_type). Who is Active can decode the page types for GAM, SGAM, PFS, DCM (Differential Change Map), and BCM (Bulk Change Map) pages. For any other page, the page type will be an asterisk (*).
These waits are all on update (UP) latches, but it's also quite common to see exclusive (EX) latches when this problem occurs
Fixing this problem is amazingly simple: just create more tempdb files! When you create additional tempdb files—as long as they’re equally sized—SQL Server will automatically balance incoming requests across the multiple files. Since each file maintains its own PFS, GAM, and SGAM information, the contention will be eliminated.
How many files should you create? Well, that depends on your workload. I take a simple approach: start with a number of files equal to one quarter to one half of the number of logical schedulers. If the contention goes away (i.e. Who is Active is no longer showing PAGELATCH waits on these special pages in tempdb), stop there. Otherwise, keep increasing the number of files until the contention does go away.
Again: make sure to keep the files equally sized! SQL Server’s algorithm is based on a proportional fill model, which means that if one file is bigger than the others it will be chosen more often. This will cause it to grow more quickly, which will cause it to be chosen more often, which will cause it to grow more quickly, which will keep compounding until your disk is full and your performance is back in the red zone. Not a good situation to get yourself into. Grow the files bigger than you need them at create time, and monitor to make sure things don’t get out of hand.
Use Who is Active to check your production servers for tempdb contention! Tell me in the comments below whether you found any. It’s amazing how common this issue is, yet how simple the fix turns out to be. A very satisfying task for even the most harried of DBAs.