THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

  • Geek City: Where are LOBs stored?

    When researching a question from one of the students in my class last week, I was reading the documentation for CREATE TABLE about storing LOB columns at http://msdn.microsoft.com/en-us/library/ms174979.aspx. For this discussion LOB columns includes text, image, ntext, xml and the MAX columns when they are over 8000 bytes and stored outside the regular data row. I knew that SQL Server gives us the capability of storing LOB columns in a separate filegroup with the TEXTIMAGE_ON clause, but I was surprised at one thing the docs mentioned.

    The documentation says:

    TEXTIMAGE_ON { filegroup | "default" }

    Are keywords that indicate that the text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined type columns are stored on the specified filegroup.

    TEXTIMAGE_ON is not allowed if there are no large value columns in the table. TEXTIMAGE_ON cannot be specified if <partition_scheme> is specified. If "default" is specified, or if TEXTIMAGE_ON is not specified at all, the large value columns are stored in the default filegroup. The storage of any large value column data specified in CREATE TABLE cannot be subsequently altered.

    The way I read this highlighted sentence is that if you don’t put the LOB data on a particular filegroup, it will always go on the default filegroup. The default filegroup has a specific meaning for SQL Server. It is the filegroup where all objects are placed if you don’t use the ON clause to put them on a specific filegroup. By default, the default filegroup is the PRIMARY filegroup, but with the ALTER DATABASE / MODIFY FILEGROUP command, you can change the default filegroup.

    I did not think that what this sentence was saying was true, so I decided to test it.

    I first created a new database with three filegroups. The PRIMARY filegroup is the default filegroup, because I am not changing the default to be anything else. The two additional filegroups are FG1 and FG2.

    USE master;

    GO

    IF EXISTS (SELECT * FROM sys.databases WHERE name = 'TestFGs')

    DROP DATABASE TestFGs;

    GO

    CREATE DATABASE TestFGs

    ON PRIMARY

    (NAME = Data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFGPrimary.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%),

    FILEGROUP FG1

    (NAME = FG1_data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\FG1_data.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%),

    FILEGROUP FG2

    (NAME = FG2_data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\FG2_data.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%)

    LOG ON

    (NAME = TestFG_log,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFG_log.ldf'

    , SIZE = 20 MB

    , MAXSIZE = 2 GB

    , FILEGROWTH = 25%);

    GO

    I then use the new database, and create three tables, each with the same column definitions. One of the column is a TEXT column which will be stored separately from the regular data rows unless I specify otherwise.

    The first table does not specify any filegroups so everything should go on the default filegroup. The second table specifies a filegroup for the table (FG1) but not for the LOB data. The third table specifies a filegroup for the table (FG1) and separate filegroup for the LOB data (FG2).

    To determine where the data has been stored, I examine the sys.allocation_units view. I need to join it to sys.partitions in order to get the corresponding object_id. The function object_name will give me the name of the table, and the function filegroup_name will translate the column data_space_id into a filegroup name.

    USE TestFGs

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows')

    DROP TABLE bigrows;

    GO

    CREATE TABLE bigrows

    (a char(1000),

    b varchar(8000),

    c text );

    GO

    INSERT INTO bigrows

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    go

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) = 'bigrows';

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows2')

    DROP TABLE bigrows2;

    GO

    CREATE TABLE bigrows2

    (a char(1000),

    b varchar(8000),

    c text )

    ON FG1;

    GO

    INSERT INTO bigrows2

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    go

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) LIKE 'bigrows%';

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows3')

    DROP TABLE bigrows3;

    GO

    CREATE TABLE bigrows3

    (a char(1000),

    b varchar(8000),

    c text )

    ON FG1 TEXTIMAGE_ON FG2;

    GO

    INSERT INTO bigrows3

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    GO

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) like 'bigrows%';

    And here are my results. The first table, bigrows1 has both its regular rows and its LOB data on the default filegroup, PRIMARY. You might also notice that an allocation unit for row-overflow data was created, even though we didn’t have any data that needed that space.

    The second table, bigrows2, has all its data on FG1 even though I just specified to put the table there. The LOB data is stored with the rest of the table, if I don’t specifically place the LOB data on a different filegroup.

    Only for the third table, bigrows3, is the LOB data stored separately, and only because I specifically used the TEXTIMAGE_ON clause when creating the table.

    clip_image001

    So now you know.

    ~Kalen

  • Geek City: What gets logged for SELECT INTO operations?

    Last week, I wrote about logging for index rebuild operations.  I wanted to publish the result of that testing as soon as I could, because that dealt with a specific question I was trying to answer. However, I actually started out my testing by looking at the logging that was done for a different operation, and ended up generating some new questions for myself.

    Before I started testing the index rebuilds, I thought I would just get warmed up by observing the logging for SELECT INTO. I thought I knew what got logged, but I was wrong. My understanding was that as a minimally logged operation, in FULL recovery, SELECT INTO would log each 8K page as it was written, not each row like a normal INSERT would. In BULK_LOGGED, SELECT INTO would create a log record for each page, but it would not log the entire 8K page. And of course, it would also not log every row.  However, my first test turned up some surprising results.

    Here's what I did: First, I made sure the AdventureWorks database was in FULL recovery, and dropped the Details table if it existed. Then I created  a table called SELECTINTOLoggingData to hold the results of calling fn_dblog.  As in last week's script, the following information was captured:

    • Max log record size
    • Sum of log record sizes
    • Number of log records
    • Number of log records larger than 8K

    Finally, I backed up the AdventureWorks database to give me a foundation for log backups. So here's the first part of the script:

    ALTER DATABASE AdventureWorks SET RECOVERY FULL
    GO
    USE AdventureWorks;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables    
                WHERE name = 'Details')
           DROP TABLE Details;
          
    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'SELECTINTOLoggingData')
        DROP TABLE SELECTINTOLoggingData;
    GO

    CREATE TABLE SELECTINTOLoggingData
    (DBRecModel CHAR(12), MaxLogRowSize INT, NumLogRows INT, NumLargeRows INT);
    GO

    BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
    GO

    The actual testing was be done first in FULL recovery, then in BULK_LOGGED. I backed up up the transaction log, then created the Details table by executing a SELECT INTO from the Sales.SalesOrderDetail table.  The Details table has  1495 pages and 121,317 rows. Finally, I inserted the relevant values from fn_dblog  into the SELECTINTOLoggingData table.

    ----  FULL Recovery


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*) FROM fn_dblog(null, null);
    GO
    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    INSERT INTO SELECTINTOLoggingData
    SELECT 'FULL',
            MAX([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
            FROM fn_dblog(null, null);
    GO

    DROP TABLE Details;
    GO


    -- BULK_LOGGED Recovery


    ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    INSERT INTO SELECTINTOLoggingData
    SELECT 'BULK_LOGGED',
            MAX([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
            FROM fn_dblog(null, null);
    GO

    Finally, I looked at the data I collected:

    SELECT * FROM SELECTINTOLoggingData;
    GO

    Here are my results:

    image

    Although the number of large (8K) log rows was as expected, I was quite surprised to see the total number of log records! (And yes, I know the column names use the term "Log Rows", but I'm calling them "Log Records".  I guess I just wanted a shorter column name.)

    I then ran the same test on a different version of the Sales.SalesOrderDetail table that I use in some of my class examples. Here is the script to create the Sales.SalesOrderDetail2 table:

    CREATE TABLE Sales.SalesOrderDetail2 (
        [SalesOrderID] [int] NOT NULL,
        [SalesOrderDetailID] [int] NOT NULL,
        [CarrierTrackingNumber] [nvarchar](25) NULL,
        [OrderQty] [smallint] NOT NULL,
        [ProductID] [int] NOT NULL,
        [SpecialOfferID] [int] NOT NULL,
        [UnitPrice] [money] NOT NULL,
        [UnitPriceDiscount] [money] NOT NULL ,
    CONSTRAINT [PK_SalesOrderDetail2_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED
    (
        [SalesOrderID] ASC,
        [SalesOrderDetailID] ASC
    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY];
    GO

    INSERT INTO Sales.SalesOrderDetail2 SELECT [SalesOrderID]
          ,[SalesOrderDetailID]
          ,[CarrierTrackingNumber]
          ,[OrderQty]
          ,[ProductID]
          ,[SpecialOfferID]
          ,[UnitPrice]
          ,[UnitPriceDiscount]
      FROM AdventureWorks.Sales.SalesOrderDetail;
    GO

    After setting the AdventureWorks database back to FULL recovery, deleting the Details table and truncating the SELECTINTOLoggingData table,  I run the same test as above (starting at the "Full Recovery" comment, substituting Sales.SalesOrderDetail2 for Sales.SalesOrderDetail.

    ALTER DATABASE AdventureWorks SET RECOVERY FULL;
    GO
    DROP TABLE Details;
    GO
    TRUNCATE TABLE SELECTINTOLoggingData
    ;
    GO

    Here are my new results:

    image

    So these are much more like the results I expected from a SELECT INTO operation!

    Why did the first example, selecting from Sales.SalesOrderDetail, do so much more logging?

    It turns out that if you have a IDENTITY column in the source table, SELECT INTO must then log every row as the IDENTITY value is generated.

    Does this mean you should rethink the beneift of using IDENTITY columns? Only if you're doing a lot of SELECT INTO operations to copy a table with IDENTITY columns. Normal INSERT statements, which is how IDENTITY values are usually generated, are going to have the same amount of logging whether or not the table has an IDENTITY column.  (Let me know if you think otherwise.)

    So now you know.

    ~Kalen

    p.s. I would like to thank Sunil Agarwal for his brilliant insights and encouragement. Thanks for everything, Sunil!

  • Geek City: What gets logged for index rebuild operations?

    This blog post was inspired by a question from a future student. Someone who was already booked for my SQL Server Internals class in June asked for some information on a current problem he was having with transaction log writes causing excessive wait times during index rebuild operations when run in ONLINE mode. He wanted to know if switching to BULK_LOGGED recovery could help.

    I knew the difference between ALTER INDEX in FULL vs BULK_LOGGED recovery when doing normal OFFLINE rebuilds, but I wasn't sure about ONLINE. So I dug out some old scripts, massaged them a bit, and ran some tests.

    A minimally logged operation is one that does not always log every single row. In FULL mode, SQL Server will log the entire page as each page is filled, so that we end up with individual  log records of over 8K. In BULK_LOGGED mode, SQL Server just logs the fact that a new page was generated (and some information about the page) and doesn't log the full 8K. So the log space used is quite a bit less.

    I will use Microsoft's AdventureWorks database that you can download from codeplex. I used the SQL Server 2005 version.

    After making sure the database is set to FULL recovery, I make a copy of the Sales.SalesOrderDetail table (which I call Details) and build a clustered index on it.  The table has about 1580 pages and 121,317 rows.


    ALTER DATABASE AdventureWorks SET RECOVERY FULL
    GO
    USE AdventureWorks;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables    
                WHERE name = 'Details')
           DROP TABLE Details;
          
     
    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    CREATE CLUSTERED INDEX CL_details_indx ON Details(SalesOrderID);
    GO

    I then build a table to store log information that I will gather after the different index rebuild operations. After that I back up the database to make sure I have a base for making log backups.

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'AlterIndexLoggingData')
        DROP TABLE AlterIndexLoggingData;
    GO

    CREATE TABLE AlterIndexLoggingData
    (DBRecModel CHAR(12), IndexBuildMode CHAR(3), MaxLogRowSize INT, SumLogSize BIGINT, NumLogRows INT, NumLargeRows INT);
    GO
    BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
    GO

    Now starts the testing. There are four sections, each of which basically does the same thing, with the four permutations of FULL vs BULK_LOGGED recovery, and ONLINE vs OFFLINE index builds.  Each section backs up the transaction log to truncate it, and then reports the number of rows in the log, just to verify we really have (logically) cleared it out. It then performs a rebuild of the index on the Details table. Finally, it captures the following information from the log using fn_dblog:

    • Max log record size
    • Sum of log record sizes
    • Number of log records
    • Number of log records larger than 8K

    ----  FULL Recovery, ONLINE = OFF

    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*) FROM fn_dblog(null, null);
    GO
    ALTER INDEX  CL_details_indx  ON Details REBUILD
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'FULL', 'OFF',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
     
     
    --- FULL Recovery, ONLINE = ON


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD
    WITH (ONLINE = ON)
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'FULL', 'ON',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    --SELECT * FROM AlterIndexLoggingData;
    GO

    --- BULK_LOGGED Recovery; ONLINE = OFF


    ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'BULK_LOGGED', 'OFF',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    --SELECT * FROM AlterIndexLoggingData;
    GO

    --- BULK_LOGGED Recovery, ONLINE = ON


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD WITH (ONLINE = ON)
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'BULK_LOGGED', 'ON',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    SELECT * FROM AlterIndexLoggingData;
    GO

    Here are my results:

    image

    Notice the following:

    • Only FULL recovery with an OFFLINE rebuild logged the 8K pages.
    • FULL recovery with an ONLINE rebuild actually did log every single row; in fact, it looks like at least two log records were generated for each row. There is a LOT of logging for this set of options and by far, the largest amount of log space used.
    • BULK_LOGGED mode used substantially less log space than FULL, but the difference between ONLINE and OFFLINE is even more pronounced. In FULL recovery, ONLINE used about twice the log space, in BULK_LOGGED, the logging required more than 8 times the space!

    I'm sure there are more fun facts you can pull out of the results from fn_dblog, but for my first look at the logging differences between ONLINE and OFFLINE index rebuilds, this is what I wanted to know. It's not a complete answer to the student's question, but it does indicate there is a lot more logging with ONLINE index builds, so it should not be surprising if there are a lot more log write waits. And I would assume that switch to BULK_LOGGED recovery could make a noticeable difference.

    I hope this is useful!

    ~Kalen

  • Geek City: Preparing for the SQL Server Master Exam

    I was amazed at the results when I just did a search of SQLBlog, and realized no one had really blogged here about the changes to the Microsoft Certified Master (MCM) program. Greg Low described the MCM program when he decided to pursue the MCM at the end of 2008, but two years later, at the end of 2010, Microsoft completely changed the requirements.

     Microsoft published the new requirements here. The three week intensive course is no longer required, but that doesn't mean you can just buy an exam cram guide. Experience with SQL Server is absolutely required to pass the rigorous knowledge exam and hands-on lab exam. When you see all the recommended preparation, you might wish there was still a way to get all the preparation done in only three weeks. The Microsoft site lists many resources you can use to prepare, including books, web videos and instructor led training. My SQL Server 2008 Internals book is at the top of the book list, and my 5-day SQL Server Internals and Tuning course is listed as a suggested training resource.

    My next delivery is in about 6 weeks, to be taught in Boston, just before the Boston Marathon! The class is organized by SQLBlog's own Adam Machanic and his Boston SQL Training,  and you can get more information and register here.

    See you in class!

    ~Kalen

  • Did You Know: Where's the Best Place to Get Started?

    Have you ever wondered how to start learning about a specific SQL Server topic? Or how to advise someone who asks "How do I get started?"

    Redgate, and SQL Server Central, have launched a new site to answer that question. 

    SQL Server Stairways are a series of tutorials designed to take you from zero knowledge of a particular SQL Server topic, to a level of practical understanding that will allow you to start using that feature in a production environment.

    Not all the tutorials are finished, but new articles are being added regularly. The following Stairways are already on the site:

      • SQL Server Agent
      • Server-side Tracing
      • SQL Server Indexes
      • Transactional and Merge Replication
      • Database Design
      • MDX
      • Integration Services
      • Reporting Services
      • StreamInsight

    3dStairway200x100

    Have fun!

    ~Kalen

  • Did You Know: I'm going back to Boston!

    After spending 5 months in Boston last summer and fall, I'm going back again, to teach my 5-day SQL Server Internals and Query Tuning class for Boston SQL Training. Last time I taught my course there, we had a completely full class with a waiting list. So I'm going to offer the course again!

    The class will be delivered April 11- 15 but early bird registration ends next Tuesday, February 1st.

    I'm very excited to go back, and thankful to Adam Machanic of Boston SQL Training for this opportunity.

    You can read what Adam says about the course, including some student comments, in Adam's post.

    Thanks!

    ~Kalen

  • Geek City: Ordered Seeks and Scans

    I got a couple of really great questions during my SSWUG Workshop this morning, as I was discussing seeks and scans, and since the answers to the two questions are very related, I decided to address both of them in more detail in this blog post.

    Most people think about a seek operation as retrieving just one or a few rows, and a scan as accessing the entire table  or index leaf level.  And that usually is what I mean when I am just describing SQL Server behavior. But there is also the possibility that SQL Server could perform a 'partial scan', starting at some row in the table or index leaf, and then scanning only to a particular ending row. However, there is no operator in a query plan called 'partial scan'.  So how can you tell when you have one?

    Before I specifically answer that, let me tell you about the first question in the workshop this morning. I had been showing the Properties window (available from View/Properties Window), pointing out that when you click on one of the icons in a graphical query plan, the Properties window shows the same information that is in the yellow box that pops up, plus additional information. One of the values shown in one called 'Ordered', and when 'Ordered' has the value true, then there is also a value called 'Scan Direction', which might make you think that ordering is only relevant to scans.  But you will see True values for 'Ordered' even when the operation is a seek. This is usually an indication that a partial scan is being performed.

    And because I was talking about heaps being unordered sets of rows, one of the delegates asked: "Are index seeks against a heap always unordered?"

    The answer is, of course, "it depends".  If you have a covered query, the index access could be an ordered seek of the covering index, even though the underlying table is a heap. Here is an example from the AdventureWorks database. I make a copy of the Sales.SalesOrderHeader table, so it is a heap, and then build a nonclustered index on the Freight column.

    IF EXISTS (SELECT * FROM dbo.SalesOrderHeader)
       DROP TABLE dbo.SalesOrderHeader;
    GO
    SELECT * INTO dbo.SalesOrderHeader
    FROM Sales.SalesOrderHeader;
    GO
    CREATE INDEX ix_Freight ON dbo.SalesOrderHeader(Freight);
    GO

    Then I look at the plan for a covered query using the ix_Freight index:

    SELECT AVG(Freight)
    FROM dbo.SalesOrderHeader
    WHERE Freight < 100;

    And here is the plan showing the index seek:

    image

    And the Properties sheet for the Index Seek shows that Ordered - True and Scan Direction = FORWARD.

    image

    This is an example of an ordered seek on a nonclustered index, or what we can think of as a partial scan, or range scan.

    The other question during the workshop was how to tell if the seek was really a range scan. Part of the answer is to look for the Ordered = True value, as I just illustrated. Another thing to notice in the graphical query plan is the thickness of the line moving rows to the left from the Index Seek, indicating that LOTS of rows are being returned from the Seek. That's another clue that we have a partial scan.

    We can also get partial scans with clustered indexes. I'll build a clustered index on my new table.

    CREATE CLUSTERED INDEX clus_OrderID ON dbo.SalesOrderHeader(SalesOrderID);
    GO

    There are 31465 rows in the dbo.SalesOrderHeader table, with SalesOrderID values ranging from 43659 to 75123. I'll SELECT a small range of data:

    SELECT * FROM dbo.SalesOrderHeader
    WHERE SalesOrderID < 44000

    The plan shows a Clustered Index Seek, and the Properties shows Ordered = True. 

    If I make the range bigger, and look for SalesOrderID < 60000, I still get an Index Seek, and the same when using an upper limit of 75000. If I run a query that returns ALL the rows, I still get an Index Seek.

    image

    Both queries are using an Index Seek, but notice the difference in the thickness of the arrows.

    It turns out that the only time the optimizer will tell us it is doing a scan is when it knows absolutely that it will be looking at every row in the table or index leaf. And the only way it can know this is if there is no filter condition (NO WHERE clause) or no index on the filter condition. With the index on SalesOrderID, there is no way the optimizer can be absolutely sure that it will have to scan the whole table. Someone might have inserted a value of 100001 since the last time statistics were updated, and we wouldn't want to perform a scan and return a value that is not in the specified range. So the Index Seek in the plan really just means that SQL Server will access the data using the index, but not with a complete scan. There is something in the query that give a starting point, a stopping point or both.

    I've got another Workshop for SSWUG on Index Internals, coming up on February 25th. Hopefully, I'll get a lot more great questions during that one!

    Thanks!

    ~Kalen

  • Did You Know? My PASS Demos and Social Networking

    My PASS demos are now available on my website for my Precon Session on SQL Server Concurrency: Locking and Blocking and Row Versions, Oh My!  and for my Spotlight Session on Seeking SQL Server Secrets. You can download both sets of scripts from

    www.SQLServerInternals.com/conferences

    Also, I set up a FaceBook fan page a few weeks ago, for SQL Server technical information, news, tips and tricks... the Facebook page name is SQL Server Internals, and the direct URL is http://www.facebook.com/SQLInternals. Note that on my personal FaceBook page I do not 'friend' people that I don't know, but I don't post any technical info on my personal page. If you want technical info from me on FaceBook, just 'Like' the SQL Server Internals page.

    Finally, I started tweeting, right after the PASS conference ended. You can look for @SQLQueen

    Enjoy!

    ~Kalen

  • Did You Know: SQL Geeks for Good

    Are you going to PASS next week? We're not going to have anything to compete with the big release and book-signing for the SQL Server MVP Deep Dives book like we had last year, but there is something special you can do this year.

    A former student is collecting personal care items to take on a mission trip to Nicaragua. Here is his statement of purpose for the project:

    Corner of Love leads mission teams to villages in impoverished northern Nicaragua, providing life changing medicines, dental care, and basic personal hygiene items.  Coupled with service projects like establishing safe/clean water sources, this will promote an improvement in the general health level of these communities.

    If you can donate any items, including leftover hotel amenities, please drop them by the Red Gate booth during the expo at booth #408 The full list of needed items can be found here:

    http://www.corneroflove.org/asp/col_collect_items.asp

    Although this group is Christian, the services and donations are given to anyone who needs them.

    Alternatively, you can bring items, even extra toiletries from your hotel room, to my Spotlight session, which is the last session on the last day of the conference in Room 2AB. (That's Thursday, from 4:00 - 5:30)

    And to my blogging and tweeting friends and colleagues, if you can help spread the word, I'd appreciate it very much!

    THANK YOU!!!!!!!!!!!!

    ~Kalen

  • Geek City: Inside SQL Azure

    My new whitepaper has just been published!

    Inside SQL Azure
    http://social.technet.microsoft.com/wiki/contents/articles/inside-sql-azure.aspx

    The paper examines the internals of the SQL Azure databases, and how they are managed in the Microsoft Data Centers, to provide you high availability and immediate scalability in a familiar SQL Server development environment.

    SQL Azure is changing so fast, I thought everything in here was going to be out of date before it saw the light of day. :-)

    Enjoy!

    ~Kalen

  • TSQL Tuesday #11:Rolling Back TRUNCATE TABLE

    TSQL2sDay150x150

    “It ain't what people don't know that hurts them it's what they know that ain't so” 
    -- commonly attributed to Mark Twain

    SQL Server has been around for a long time now, and people who learn one version well sometimes have  monumental task when a new version comes out. They need to figure out what changed. The "What's New" documentation usually gives a good idea of the major new features, but sometimes when a more minor feature just change a bit, it's may get overlooked in all the hoopla of learning about the BIG new stuff.

    One change that occurred very long ago that many people still haven't hooked into is that fact that a TRUNCATE TABLE operation can be rolled back! Even in my advanced classes, with some of the best DBAs out there, when I start talking about TRUNCATE TABLE, there are frequently one or two people who think it is unlogged and thus can't be rolled back.

    Let me state now… THIS IS NOT TRUE.

    And in fact, it is very easy to prove that it is not true… but if you don't suspect that it is not true, why would you even test it?

    TRUNCATE TABLE is logged, but SQL Server doesn't log every single row as it TRUNCATEs the table, so you might think that this command falls into the category of commands we call 'minimally logged'. But although indeed minimal information is logged for this operation, it is not technically in that category. The official 'minimally logged' operations also cause a bit to be set in a big bit map, so that when you are running in BULK_LOGGED recovery model and make a log backup, SQL Server can capture all the data involved in 'minimally logged' operations into the transaction log. This allows the data to be recovered when restoring from backup.

    But for TRUNCATE TABLE, SQL Server only logs the fact that the operation happened, along with information about the pages and extents that were deallocated. So that's minimal information. And by recording which pages and extents were deallocated, there's enough information to roll back, by just REallocating those pages. A log  backup only needs the information that the TRUNCATE TABLE occurred. And to restore the TRUNCATE TABLE, the operation is just reapplied. The data involved is not needed during RESTORE (like it would be for a true 'minimally logged' operation like a BULK INSERT).

    So what if the transaction in which the TRUNCATE TABLE occurs stay open for a long time? How does SQL Server know not to reuse the pages that belonged to the table? It turns out the pages and/or extents involved are locked with an eXclusive lock, and just like all X locks, they are held until the end of the transaction.  And as long as the pages or extents are locked, they can't be deallocated, and certainly cannot be reused.

    So here's some code to see for yourself. I'll make a copy of a table from the AdventureWorks database:

    USE AdventureWorks
    GO

    IF EXISTS (SELECT * FROM sys.tables
                        WHERE name = 'Products'
                        AND schema_id = 1)
        DROP TABLE dbo.Products;
    GO

    SELECT * INTO dbo.Products
    FROM Production.Product;
    GO

    Now we'll look at the count of rows, and the pages that belong to the table.

    SELECT COUNT(*) FROM dbo.Products;
    GO
    DBCC IND('AdventureWorks', 'Products', -1);
    GO

    We get a count of 504 rows and the following pages:

    File    Page
    -----   -------
    1       23027
    1       23029
    1       23030
    1       26992
    1       26993
    1       33352
    1       33353
    1       33354
    1       33355
    1       33356
    1       33357
    1       42486
    1       42487
    1       42488
    1       42489

    Next, start a transaction and truncate the table. Verify that DBCC IND shows no pages belonging to the table and the count is 0. Then, look at the locks.

    BEGIN TRAN
    TRUNCATE TABLE dbo.Products;
    SELECT COUNT(*) FROM dbo.Products;

    DBCC IND('AdventureWorks', 'Products', -1);
    DBCC EXTENTINFO('AdventureWorks', 'Products', -1);

    SELECT resource_type, resource_description,
            request_mode FROM sys.dm_tran_locks
    WHERE  resource_type IN ('EXTENT', 'PAGE')
    AND   resource_database_id = DB_ID('AdventureWorks');

    You should see no rows from DBCC IND, and 0 rows from count(*). But the locks info should return something like:

    resource_type resource_description request_mode
    ------------- -------------------- ------------
    EXTENT        1:33352              X
    PAGE          1:42486              X
    EXTENT        1:42488              X
    PAGE          1:42487              X
    PAGE          1:42488              X
    PAGE          1:42489              X
    PAGE          1:23027              X
    PAGE          1:23030              X
    PAGE          1:23029              X
    PAGE          1:26992              X
    PAGE          1:26993              X

    So the extent and page locks include all the pages that we saw in the DBCC IND output. (Remember, an extent is 8 contiguous pages, starting at the one that describes the extent.)

    Only after you ROLLBACK the transaction will the locks be released, and you should see all the rows and pages back in the table again.

    ROLLBACK TRAN;
    GO
    SELECT COUNT(*) FROM dbo.Products;
    DBCC IND('AdventureWorks', 'Products', -1);
    GO

    So now you know!

    ~Kalen

  • Did You Know? It's really easy to convey the wrong message when writing

    And it's also really easy to procrastinate. I meant to blog about a big misunderstanding right after last month's SQL Server Magazine UPDATE commentary appeared, but now I'm just finishing up this month's commentary and only now realized I forgot to write this blog post.

    In this commentary
    http://www.sqlmag.com/article/sql-server/Disseminating-SQL-Server-Information.aspx I talked about how difficult it is to find the information you might need, and how difficult it is for educators, or product support providers or documentation writers to make sure EVERYONE has the necessary information.

    The title of the article is "Disseminating SQL Server Information", and if you go to the page that lists all my articles, you'll see the short description:

    How can we ensure that everyone can find the SQL Server information they need?

    But some people thought I was actually complaining ("beating people up") for not knowing certain facts. 

    I apologize profusely to anyone who assumed I was berating them. I hope that anyone who actually knows me, knows that this would be the last thing on my mind, but I can only assume that there are readers out there who have never met me.

    My point in the article was not only to talk about how hard it is to find correct answers, and know that they are correct answers, but that it's almost impossible to make sure everyone who needs the information can find it. Particularly since there are new SQL Server users every day!

    I certainly don't know EVERYTHING about SQL Server, and I have trouble tracking down answers sometimes. I know it's orders of magnitude harder for people who have a lot fewer sources of information than I have.

    Again, I apologize to anyone who has ever thought any of my writing or responses are condescending in any way. It has never been my intention to make anyone feel bad for what they don't know, particularly when they're doing something to try to learn more!

    Thanks so much to all my readers… where would I be without you all?

    ~Kalen

  • Geek City: 24Hours of PASS Followup

    Thank you everyone who attended my 24HOP session yesterday. It was a wonderful event to be part of.

    Here are a few of the questions that I was asked, that didn't get completely answered.

    Question: Can we set READ_COMMITTED_SNAPSHOT ON for just a table rather than at database level?

    I apologize that I answered this incorrectly.  I was trying to remember all of the locking hints on the spur of the moment, and remembered that there is a hint READCOMMITTED LOCK to force default locking/blocking behavior when the database has been set to READ_COMMITTED_SNAPSHOT, so I was thinking there should be one to go the other way. But now that the stress of the moment is off, I realize that there could not be such a hint. Enabling snapshot isolation means that SQL has to save the old committed values of any changed data as the change is taking place. For a SELECT to just request to read older committed values, when the database has not already been set to save them, would not be possible.  The data you're looking for would not be available.

    Question: Is it ever appropriate to use the No Lock hint? I'm thinking of using it in certain Select statements.

    What does 'appropriate' mean? Like all the phenomena I discussed in my presentation, allowing dirty reads is a choice. You need to be aware of the costs and the tradeoffs, but you might decide in some situations that the value you get is worth the (potentially high) price.  I only hope that people don't just choose to use it arbitrarily, just because it's "faster", but that they are aware of the ramifications of performing dirty reads.

    Question: Where can I get more information about the tradeoffs between the different isolation levels and concurrency models?

    I'm glad you asked. This presentation was only a PREVIEW of what I'll be talking about for an entire day in a pre-conference session for PASS. You can get more details here:

    http://sqlpass.eventpoint.com/topic/details/DBA301P

    Question: Is there a way to get access to these slides at the end of the presentation please?

    I am attaching them to this blog post, along with a PDF of the presentation, and I will also be adding them to my web site at www.SQLServerInternals.com/conferences by the end of the week.

    Question: Can Kalen unprotect her tweets?

    I could, but it wouldn't be very interesting since I don't tweet and there would be nothing to read. But lately I have started thinking more about it. Maybe I'll start… and if I do start tweeting on a regular basic, my tweets will be open and available.

    Thanks again!

    ~Kalen

  • Geek City: Plan Caching Internals

    This Friday is the last of my summer seminars. On August 27, I am presenting on plan caching, plan reuse and recompile and plan cache metadata. This is one of my favorite topics to talk about, because a lot of the information is not very widely known, and you can have a tremendous performance impact by making sure when plans are reused when that is the best option and when a new plan is compiled when that is best choice.  By examining the plan cache metadata, you can see exactly what plans are currently in cache and what those plans are doing, i.e. index seeks or scan? hash joins or loop joins? extra sorting? parallelization?

    Here is the link for more info and registration:

    http://www.vconferenceonline.com/shows/workshops/plancaching.asp

    The seminar has been pre-recorded, but I will be available concurrently for online chat and Q&A.

    See you there!

    ~Kalen

  • Did You Know: 24 Hours of PASS is happening again, September 15-16

    This time, it's being split into two parts, perhaps so that there is more of a chance that some people could attend the entire event! Noon - midnight (GMT) for two days in a row.

    Here are more details:

    http://www.sqlpass.org/24hours/fall2010/

    I'll be speaking for session #8 (19:00 GMT the first day) on Isolation Levels:

    http://www.sqlpass.org/24hours/fall2010/Home/IsolationvsConcurrencyWhatAretheChoices.aspx

    I hope to see you there!

    ~Kalen

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement