After spending 5 months in Boston last summer and fall, I'm going back again, to teach my 5-day SQL Server Internals and Query Tuning class for Boston SQL Training. Last time I taught my course there, we had a completely full class with a waiting list. So I'm going to offer the course again!
The class will be delivered April 11- 15 but early bird registration ends next Tuesday, February 1st.
I'm very excited to go back, and thankful to Adam Machanic of Boston SQL Training for this opportunity.
You can read what Adam says about the course, including some student comments, in Adam's post.
I got a couple of really great questions during my SSWUG Workshop this morning, as I was discussing seeks and scans, and since the answers to the two questions are very related, I decided to address both of them in more detail in this blog post.
Most people think about a seek operation as retrieving just one or a few rows, and a scan as accessing the entire table or index leaf level. And that usually is what I mean when I am just describing SQL Server behavior. But there is also the possibility that SQL Server could perform a 'partial scan', starting at some row in the table or index leaf, and then scanning only to a particular ending row. However, there is no operator in a query plan called 'partial scan'. So how can you tell when you have one?
Before I specifically answer that, let me tell you about the first question in the workshop this morning. I had been showing the Properties window (available from View/Properties Window), pointing out that when you click on one of the icons in a graphical query plan, the Properties window shows the same information that is in the yellow box that pops up, plus additional information. One of the values shown in one called 'Ordered', and when 'Ordered' has the value true, then there is also a value called 'Scan Direction', which might make you think that ordering is only relevant to scans. But you will see True values for 'Ordered' even when the operation is a seek. This is usually an indication that a partial scan is being performed.
And because I was talking about heaps being unordered sets of rows, one of the delegates asked: "Are index seeks against a heap always unordered?"
The answer is, of course, "it depends". If you have a covered query, the index access could be an ordered seek of the covering index, even though the underlying table is a heap. Here is an example from the AdventureWorks database. I make a copy of the Sales.SalesOrderHeader table, so it is a heap, and then build a nonclustered index on the Freight column.
IF EXISTS (SELECT * FROM dbo.SalesOrderHeader)
DROP TABLE dbo.SalesOrderHeader;
SELECT * INTO dbo.SalesOrderHeader
CREATE INDEX ix_Freight ON dbo.SalesOrderHeader(Freight);
Then I look at the plan for a covered query using the ix_Freight index:
WHERE Freight < 100;
And here is the plan showing the index seek:
And the Properties sheet for the Index Seek shows that Ordered - True and Scan Direction = FORWARD.
This is an example of an ordered seek on a nonclustered index, or what we can think of as a partial scan, or range scan.
The other question during the workshop was how to tell if the seek was really a range scan. Part of the answer is to look for the Ordered = True value, as I just illustrated. Another thing to notice in the graphical query plan is the thickness of the line moving rows to the left from the Index Seek, indicating that LOTS of rows are being returned from the Seek. That's another clue that we have a partial scan.
We can also get partial scans with clustered indexes. I'll build a clustered index on my new table.
CREATE CLUSTERED INDEX clus_OrderID ON dbo.SalesOrderHeader(SalesOrderID);
There are 31465 rows in the dbo.SalesOrderHeader table, with SalesOrderID values ranging from 43659 to 75123. I'll SELECT a small range of data:
SELECT * FROM dbo.SalesOrderHeader
WHERE SalesOrderID < 44000
The plan shows a Clustered Index Seek, and the Properties shows Ordered = True.
If I make the range bigger, and look for SalesOrderID < 60000, I still get an Index Seek, and the same when using an upper limit of 75000. If I run a query that returns ALL the rows, I still get an Index Seek.
Both queries are using an Index Seek, but notice the difference in the thickness of the arrows.
It turns out that the only time the optimizer will tell us it is doing a scan is when it knows absolutely that it will be looking at every row in the table or index leaf. And the only way it can know this is if there is no filter condition (NO WHERE clause) or no index on the filter condition. With the index on SalesOrderID, there is no way the optimizer can be absolutely sure that it will have to scan the whole table. Someone might have inserted a value of 100001 since the last time statistics were updated, and we wouldn't want to perform a scan and return a value that is not in the specified range. So the Index Seek in the plan really just means that SQL Server will access the data using the index, but not with a complete scan. There is something in the query that give a starting point, a stopping point or both.
I've got another Workshop for SSWUG on Index Internals, coming up on February 25th. Hopefully, I'll get a lot more great questions during that one!
My PASS demos are now available on my website for my Precon Session on SQL Server Concurrency: Locking and Blocking and Row Versions, Oh My! and for my Spotlight Session on Seeking SQL Server Secrets. You can download both sets of scripts from
Also, I set up a FaceBook fan page a few weeks ago, for SQL Server technical information, news, tips and tricks... the Facebook page name is SQL Server Internals, and the direct URL is http://www.facebook.com/SQLInternals. Note that on my personal FaceBook page I do not 'friend' people that I don't know, but I don't post any technical info on my personal page. If you want technical info from me on FaceBook, just 'Like' the SQL Server Internals page.
Finally, I started tweeting, right after the PASS conference ended. You can look for @SQLQueen
Are you going to PASS next week? We're not going to have anything to compete with the big release and book-signing for the SQL Server MVP Deep Dives book like we had last year, but there is something special you can do this year.
A former student is collecting personal care items to take on a mission trip to Nicaragua. Here is his statement of purpose for the project:
Corner of Love leads mission teams to villages in impoverished northern Nicaragua, providing life changing medicines, dental care, and basic personal hygiene items. Coupled with service projects like establishing safe/clean water sources, this will promote an improvement in the general health level of these communities.
If you can donate any items, including leftover hotel amenities, please drop them by the Red Gate booth during the expo at booth #408 The full list of needed items can be found here:
Although this group is Christian, the services and donations are given to anyone who needs them.
Alternatively, you can bring items, even extra toiletries from your hotel room, to my Spotlight session, which is the last session on the last day of the conference in Room 2AB. (That's Thursday, from 4:00 - 5:30)
And to my blogging and tweeting friends and colleagues, if you can help spread the word, I'd appreciate it very much!
My new whitepaper has just been published!
Inside SQL Azure
The paper examines the internals of the SQL Azure databases, and how they are managed in the Microsoft Data Centers, to provide you high availability and immediate scalability in a familiar SQL Server development environment.
SQL Azure is changing so fast, I thought everything in here was going to be out of date before it saw the light of day. :-)
“It ain't what people don't know that hurts them it's what they know that ain't so”
-- commonly attributed to Mark Twain
SQL Server has been around for a long time now, and people who learn one version well sometimes have monumental task when a new version comes out. They need to figure out what changed. The "What's New" documentation usually gives a good idea of the major new features, but sometimes when a more minor feature just change a bit, it's may get overlooked in all the hoopla of learning about the BIG new stuff.
One change that occurred very long ago that many people still haven't hooked into is that fact that a TRUNCATE TABLE operation can be rolled back! Even in my advanced classes, with some of the best DBAs out there, when I start talking about TRUNCATE TABLE, there are frequently one or two people who think it is unlogged and thus can't be rolled back.
Let me state now… THIS IS NOT TRUE.
And in fact, it is very easy to prove that it is not true… but if you don't suspect that it is not true, why would you even test it?
TRUNCATE TABLE is logged, but SQL Server doesn't log every single row as it TRUNCATEs the table, so you might think that this command falls into the category of commands we call 'minimally logged'. But although indeed minimal information is logged for this operation, it is not technically in that category. The official 'minimally logged' operations also cause a bit to be set in a big bit map, so that when you are running in BULK_LOGGED recovery model and make a log backup, SQL Server can capture all the data involved in 'minimally logged' operations into the transaction log. This allows the data to be recovered when restoring from backup.
But for TRUNCATE TABLE, SQL Server only logs the fact that the operation happened, along with information about the pages and extents that were deallocated. So that's minimal information. And by recording which pages and extents were deallocated, there's enough information to roll back, by just REallocating those pages. A log backup only needs the information that the TRUNCATE TABLE occurred. And to restore the TRUNCATE TABLE, the operation is just reapplied. The data involved is not needed during RESTORE (like it would be for a true 'minimally logged' operation like a BULK INSERT).
So what if the transaction in which the TRUNCATE TABLE occurs stay open for a long time? How does SQL Server know not to reuse the pages that belonged to the table? It turns out the pages and/or extents involved are locked with an eXclusive lock, and just like all X locks, they are held until the end of the transaction. And as long as the pages or extents are locked, they can't be deallocated, and certainly cannot be reused.
So here's some code to see for yourself. I'll make a copy of a table from the AdventureWorks database:
IF EXISTS (SELECT * FROM sys.tables
WHERE name = 'Products'
AND schema_id = 1)
DROP TABLE dbo.Products;
SELECT * INTO dbo.Products
Now we'll look at the count of rows, and the pages that belong to the table.
SELECT COUNT(*) FROM dbo.Products;
DBCC IND('AdventureWorks', 'Products', -1);
We get a count of 504 rows and the following pages:
Next, start a transaction and truncate the table. Verify that DBCC IND shows no pages belonging to the table and the count is 0. Then, look at the locks.
TRUNCATE TABLE dbo.Products;
SELECT COUNT(*) FROM dbo.Products;
DBCC IND('AdventureWorks', 'Products', -1);
DBCC EXTENTINFO('AdventureWorks', 'Products', -1);
SELECT resource_type, resource_description,
request_mode FROM sys.dm_tran_locks
WHERE resource_type IN ('EXTENT', 'PAGE')
AND resource_database_id = DB_ID('AdventureWorks');
You should see no rows from DBCC IND, and 0 rows from count(*). But the locks info should return something like:
resource_type resource_description request_mode
------------- -------------------- ------------
EXTENT 1:33352 X
PAGE 1:42486 X
EXTENT 1:42488 X
PAGE 1:42487 X
PAGE 1:42488 X
PAGE 1:42489 X
PAGE 1:23027 X
PAGE 1:23030 X
PAGE 1:23029 X
PAGE 1:26992 X
PAGE 1:26993 X
So the extent and page locks include all the pages that we saw in the DBCC IND output. (Remember, an extent is 8 contiguous pages, starting at the one that describes the extent.)
Only after you ROLLBACK the transaction will the locks be released, and you should see all the rows and pages back in the table again.
SELECT COUNT(*) FROM dbo.Products;
DBCC IND('AdventureWorks', 'Products', -1);
So now you know!
And it's also really easy to procrastinate. I meant to blog about a big misunderstanding right after last month's SQL Server Magazine UPDATE commentary appeared, but now I'm just finishing up this month's commentary and only now realized I forgot to write this blog post.
In this commentary
http://www.sqlmag.com/article/sql-server/Disseminating-SQL-Server-Information.aspx I talked about how difficult it is to find the information you might need, and how difficult it is for educators, or product support providers or documentation writers to make sure EVERYONE has the necessary information.
The title of the article is "Disseminating SQL Server Information", and if you go to the page that lists all my articles, you'll see the short description:
How can we ensure that everyone can find the SQL Server information they need?
But some people thought I was actually complaining ("beating people up") for not knowing certain facts.
I apologize profusely to anyone who assumed I was berating them. I hope that anyone who actually knows me, knows that this would be the last thing on my mind, but I can only assume that there are readers out there who have never met me.
My point in the article was not only to talk about how hard it is to find correct answers, and know that they are correct answers, but that it's almost impossible to make sure everyone who needs the information can find it. Particularly since there are new SQL Server users every day!
I certainly don't know EVERYTHING about SQL Server, and I have trouble tracking down answers sometimes. I know it's orders of magnitude harder for people who have a lot fewer sources of information than I have.
Again, I apologize to anyone who has ever thought any of my writing or responses are condescending in any way. It has never been my intention to make anyone feel bad for what they don't know, particularly when they're doing something to try to learn more!
Thanks so much to all my readers… where would I be without you all?
Thank you everyone who attended my 24HOP session yesterday. It was a wonderful event to be part of.
Here are a few of the questions that I was asked, that didn't get completely answered.
Question: Can we set READ_COMMITTED_SNAPSHOT ON for just a table rather than at database level?
I apologize that I answered this incorrectly. I was trying to remember all of the locking hints on the spur of the moment, and remembered that there is a hint READCOMMITTED LOCK to force default locking/blocking behavior when the database has been set to READ_COMMITTED_SNAPSHOT, so I was thinking there should be one to go the other way. But now that the stress of the moment is off, I realize that there could not be such a hint. Enabling snapshot isolation means that SQL has to save the old committed values of any changed data as the change is taking place. For a SELECT to just request to read older committed values, when the database has not already been set to save them, would not be possible. The data you're looking for would not be available.
Question: Is it ever appropriate to use the No Lock hint? I'm thinking of using it in certain Select statements.
What does 'appropriate' mean? Like all the phenomena I discussed in my presentation, allowing dirty reads is a choice. You need to be aware of the costs and the tradeoffs, but you might decide in some situations that the value you get is worth the (potentially high) price. I only hope that people don't just choose to use it arbitrarily, just because it's "faster", but that they are aware of the ramifications of performing dirty reads.
Question: Where can I get more information about the tradeoffs between the different isolation levels and concurrency models?
I'm glad you asked. This presentation was only a PREVIEW of what I'll be talking about for an entire day in a pre-conference session for PASS. You can get more details here:
Question: Is there a way to get access to these slides at the end of the presentation please?
I am attaching them to this blog post, along with a PDF of the presentation, and I will also be adding them to my web site at www.SQLServerInternals.com/conferences by the end of the week.
Question: Can Kalen unprotect her tweets?
I could, but it wouldn't be very interesting since I don't tweet and there would be nothing to read. But lately I have started thinking more about it. Maybe I'll start… and if I do start tweeting on a regular basic, my tweets will be open and available.
This Friday is the last of my summer seminars. On August 27, I am presenting on plan caching, plan reuse and recompile and plan cache metadata. This is one of my favorite topics to talk about, because a lot of the information is not very widely known, and you can have a tremendous performance impact by making sure when plans are reused when that is the best option and when a new plan is compiled when that is best choice. By examining the plan cache metadata, you can see exactly what plans are currently in cache and what those plans are doing, i.e. index seeks or scan? hash joins or loop joins? extra sorting? parallelization?
Here is the link for more info and registration:
The seminar has been pre-recorded, but I will be available concurrently for online chat and Q&A.
See you there!
When SQL Server first introduced table and index partitioning in SQL 2005, it was the same version that all the metadata changed. So I decided that I could learn a lot about metadata and a lot about the organization and storage of partitions if I tried to write a query, and then a view, that included almost everything anyone would ever need to know about any partitioned objects you had created. While doing that I realized that one function that SQL Server was missing, in spite of all the incredible new metadata and built-in functions, was a function to return the name of an index, given the object_id and the index_id. Since so much of the metadata contains these ID columns, it seems like a useful thing. But I had to write one for myself:
CREATE FUNCTION dbo.index_name (@object_id int, @index_id tinyint)
DECLARE @index_name sysname
SELECT @index_name = name FROM sys.indexes
WHERE object_id = @object_id and index_id = @index_id
This function needs to be created in the database where you want to use it.
I wrote the view to return all the partitioning info to be used in my classes. It took quite a while, and was not perfect. The two pieces I kept meaning to fix were:
- It would return no rows for a table that was not partitioned at all
- It would only list one boundary point for each partition. This meant you always had to return the partitions in order in order to see the lower and upper boundary for each. However, one of my students in a class I taught in Salt Lake City in September 2009 took on the task of 'fixing' those problems, so the version I show you here is the updated one.
I decided to make this script public today in response to a query on the public forums. I'm not including a script that will create a partitioned table for you. If you don't have a partitioned table of your own, you probably won't even be interested in this script!
-- Create a view to return details about a partitioned table or index
-- First run the script to create the function index_name()
-- Written by Kalen Delaney, 2008
-- with a few nice enhancements by Chad Crawford, 2009
CREATE VIEW Partition_Info AS
SELECT OBJECT_NAME(i.object_id) as Object_Name, dbo.INDEX_NAME(i.object_id,i.index_id) AS Index_Name,
p.partition_number, fg.name AS Filegroup_Name, rows,
WHEN 1 THEN 'less than'
ELSE 'less than or equal to'
END as 'comparison'
CASE WHEN ISNULL(rv.value, rv2.value) IS NULL THEN 'N/A'
WHEN boundary_value_on_right = 0 AND rv2.value IS NULL
THEN 'Greater than or equal to'
WHEN boundary_value_on_right = 0
THEN 'Greater than'
ELSE 'Greater than or equal to' END + ' ' +
ISNULL(CONVERT(varchar(15), rv2.value), 'Min Value')
+ ' ' +
WHEN 1 THEN 'and less than'
ELSE 'and less than or equal to'
END + ' ' +
+ ISNULL(CONVERT(varchar(15), rv.value),
END as 'TextComparison'
FROM sys.partitions p
JOIN sys.indexes i
ON p.object_id = i.object_id and p.index_id = i.index_id
LEFT JOIN sys.partition_schemes ps
ON ps.data_space_id = i.data_space_id
LEFT JOIN sys.partition_functions f
ON f.function_id = ps.function_id
LEFT JOIN sys.partition_range_values rv
ON f.function_id = rv.function_id
AND p.partition_number = rv.boundary_id
LEFT JOIN sys.partition_range_values rv2
ON f.function_id = rv2.function_id
AND p.partition_number - 1= rv2.boundary_id
LEFT JOIN sys.destination_data_spaces dds
ON dds.partition_scheme_id = ps.data_space_id
AND dds.destination_id = p.partition_number
LEFT JOIN sys.filegroups fg
ON dds.data_space_id = fg.data_space_id
JOIN sys.allocation_units au
ON au.container_id = p.partition_id
WHERE i.index_id <2 AND au.type =1
-- Example of use:
SELECT * FROM Partition_Info
WHERE Object_Name = 'charge'
ORDER BY Object_Name, partition_number
Yesterday was a rebroadcast of the index internals seminar I presented last February, and I had forgotten how long it was. I was in a chat room the whole time, and the questions just kept coming, fast and (not so) furious! Thanks to all the participants for making it a really exciting 5 hours for me. I really appreciated all your enthusiasm.
My next seminar with SSWUG will be on Plan Caching and Recompiling. I always love teaching this topic, because there are so many ways you can make a major difference in performance by understanding when reusing a cached plan is a good thing and when it is NOT! This includes Parameter Sniffing and how to avoid it.
We'll look at when automatic statistics updating kicks in and when that causes recompiles, as well as looking at what other changes can cause automatic recompilation.
We'll talk about XML query plans, and -I'll show you how to access the plan cache metadata to see how to look at the plan for every query currently in cache. It's an awesome capability!
I hope to see you there!
Once again, I'll be presenting an online seminar through SSWUG. This on is on Index Internals. The presentation lasts over 3 and a half hours, and I'll be online for live chat for over 4 hours. Ask away! We'll look at juicy details of exactly how clustered and nonclustered indexes differ from each other'. I'll show you how to display the actual contents of your index rows so you can really see what the difference is between a key column and an included column.
The seminar starts Friday, 9 AM Pacific Time.
Here is the link to the registration site:
I hope to see you there!
It's time for the eighth T-SQL Tuesday, managed this time by Robert Davis at SQL Server Central. This time, the topic was announced very late, and I was just starting to write a blog post of my own when I saw it. Fortunately, my topic easily fits into this month's topic of "Gettin' Schooled", which is supposed to have something to do with either teaching or learning. I feel very lucky, because in my work, I get to do both, at the same time! Many, if not most, of the really cool things I like to tell people about, I learned because questions asked of me by my students.
In my online seminar two weeks ago, I was talking about the difference between estimated and actual execution plans. The simple explanation of the difference is that with an estimate plan (like you get with SET SHOWPLAN_ALL ON), your batch is not executed, but with an actual plan (like you get with SET STATISTICS PROFILE ON), your batch IS executed and results are returned. I talked about when the actual plan might be different than the estimated plan. Although I'm not going to go into all the differences now, I will tell you that for batches of a single query, your estimated and actual plans should be the same. Differences can come up when you have a multi statement batch, or a batch that calls a procedure.
One example that I always mention in classes and seminars is the case where the batch creates a temp table, and then uses that temp table. When trying to look at the estimated plan, you will get an error, because the batch is not executed. If it is not executed, no temp table is created, so the statement that selects from the temp table cannot even be compiled to give a plan:
-- I am using a table called Test1 to get the data for my temp table;
-- You can use any table of your own
SET SHOWPLAN_ALL ON;
SELECT ID INTO #Tmp1
The above will give the following error:
Msg 208, Level 16, State 0, Line 3
Invalid object name '#Tmp1'.
However, looking an actual plan will not:
SET SHOWPLAN_ALL OFF;
SET STATISTICS PROFILE ON;
SELECT ID INTO #Tmp1
When discussing this example during the online seminar, one of the participants asked why the behavior was different with a table variable. And I realized, I had never tried my example with a table variable. After the seminar was over, I ran some tests, and sure enough, there was no error looking at an estimate plan when a table variable was created.
-- I am still using a table called Test1 to get the data
-- However, in this case, you'll need a table with a single INT column
SET STATISTICS PROFILE OFF;
SET SHOWPLAN_ALL ON;
DECLARE @Tmp1 TABLE (ID int);
INSERT INTO @Tmp1(ID)
SELECT * FROM @Tmp1;
-- No error is returned!
I started searching online to see if anyone had an explanation, and found this blog post written by Lubor Kollar and Lindsey Allen of the SQLCAT Team at Microsoft. (The blog post also contains a script for building the Test1 table, if you want to copy it and execute it.)
Lubor and Lindsey are discussing different execution plans when using temp tables and table variables, but when I looked closer, there weren't ever getting errors, even with a temp table. It turns out their temp table was created with CREATE TABLE statement, rather than SELECT INTO, and that completely changes how the batch is processed:
SET SHOWPLAN_ALL ON;
CREATE TABLE #Tmp1 (ID int);
INSERT INTO #Tmp1(ID)
SELECT *FROM #Tmp1;
It appears the ONLY time you get an error with a temporary table and an estimated execution plan is when you create the table with SELECT INTO. In fact, with SELECT INTO, you'll get the error even if you create a permanent table in the batch.
And I learned this from questions that were asked of me … which is where I learn all kinds of really interesting information!