THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

  • Did You Know? I’m delivering my first SQL Server 2012 class this week!

    This  is actually just a very short post to get my name back up in the list on the right side of SQLBlog, because Adam’s configuration removes people who haven’t posted in 3 months. I’ve been extremely busy trying to get my  new book finished, and then get my course updated for its first delivery this week. I’ll be teaching it again in Norway at the end of the month. 

    It’s a great class this week, and a couple of questions have inspired longer posts, which hopefully I be able to get to next week.

    Hopefully I’ll also have an update on the timeline for the new book by next week.

    My schedule is always available if you are interested in learning SQL Server 2012 Internals with me!

    http://schedule.KalenDelaney.com

    Thanks

    ~Kalen

  • Did You Know? Turning Off Locking

    Back when the lock manager was rewritten for SQL Server 7, and row level locking was added to the product, the procedure sp_indexoption gave us the ability to turn OFF either row or page locks or both, for one particular index, or for a table and all the indexes on it. I must admit, I rarely found this option to be the solution to any sticky problems, although while doing some quick research prior to writing this post, I did find that Microsoft recommended turning off locking options on a couple of tables  when working with SAP:  http://msdn.microsoft.com/en-us/library/aa226172(v=sql.70).aspx

    In my experience, it is MUCH more common to have people turn off row or page locks when they shouldn’t, and by inspecting the sys.indexes catalog view, you can tell if that has happened.  (Prior to SQL Server 2005,  you could use the INDEXPROPERTY function.) 

    I recently got email from a reader of my books who pointed out to me an error I had made in my SQL Server 2005 book, where I had claimed that sp_indexoption could not be used to disallow row or page locks on heaps. If you check the BOL, it clearly says that you can supply either an index name or a table name to sp_indexoption, so that was just an oversight in my writing. 

    But then along comes ALTER INDEX, and the BOL now says you should avoid using sp_indexoption, as it will be removed in a future release, and you should use ALTER INDEX instead. So the reader sent me a follow-up question, asking how can we disallow row or page locking on a heap using ALTER INDEX. 

    It turns out that we can specify the ALL option instead of an index name when using ALTER INDEX, and this has the same effect as using sp_indexoption on a heap.  You can see it in this short example.

    First create a heap with a nonclustered index:

    use tempdb;
    GO
    IF object_id('testlocks') IS NOT NULL
       DROP TABLE testlocks;
    GO

    CREATE TABLE testlocks
    ( a int);
    CREATE INDEX testlocks_index ON testlocks(a);
    GO

    Now look at the lock properties:

    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    Results:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- ---------------
    1               1                HEAP
    1               1                NONCLUSTERED

    As shown, the default is that both row and page locks are allowed.

    Now use the old sp_indexoption to disallow row locks and then check the lock properties again:

    EXEC sp_indexoption testlocks, disallowrowlocks, 1;
    GO
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    Results:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    0               1                HEAP
    0               1                NONCLUSTERED

    Notice that sp_indexoption turned off row locking for both the heap and the nonclustered index.

    Now use ALTER INDEX to allow row locks and inspect again:

    ALTER INDEX ALL ON testlocks
    SET  (ALLOW_ROW_LOCKS = ON );
    GO
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    RESULTS:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    1               1                HEAP
    1               1                NONCLUSTERED

    Keep in mind that I’m not recommending that you turn off either row or page locking. And if you decide to change the types of locks allowed, make sure you test your application thoroughly under your maximum expected load to make sure you haven’t made things worse. And remember that you can’t turn off TABLE locking, only row and page locking.

    Have fun!

    ~Kalen

  • Geek City: Build a Big Table with a Columnstore Index

    I was looking all over to find a big table to use for my columnstore examples in my SQL Server 2012 book, and the only one I found was from someone at Microsoft and was not publicly available. When I demonstrate code in my writings, I like to have the code available for anyone to use, so that was no use.

    Finally I realized I was just going to have to do it myself. I actually based the script on some work by the awesome Benjamin Nevarez (blog| twitter), but I needed to make a couple of changes to the table structure, and then the loop for adding the data took some tweeking to make sure that uniqueness was possible, in case you need to test out how the metadata changes when a columnstore index is built on a table with a unique clustered index compared to a nonunique clustered index compared to a heap.

    I have just finished the chapter on indexes, and decided to make this script available. The initial table data is based on the data in Microsoft AdventureWorksDW2012 sample database that you can download here. (Note that the DW version is the first one under “Other Available Downloads”’; it’s not the one under “Recommended Download”. )

    Here is the section of the script that populates most of the FactInternetSalesBIG table. (The original 60K rows were just copied from the original FactInternetSales table. Note that all the statements have to be run in a single batch because the local variable’s scope is the batch, and because the GO 9 applies to the single batch that precedes it. Of course, you can change the number of iterations to end up with a different size table. Mine ends up at just over 30 million rows. In addition to modifying the SalesOrderNumber value on each iteration, I also changed the value of the Revision column to indicate which pass through the insert loop was being executed.



    -- Copy the new big table into itself 9 times
    DECLARE @RevisionNumber nchar(2);
    SELECT @RevisionNumber = RevisionNumber + 1 FROM RevisionNumberValue;
    SELECT @RevisionNumber as RevisionNumber;
    INSERT INTO dbo.FactInternetSalesBig WITH (TABLOCK)
    SELECT ProductKey
    ,OrderDateKey
    ,DueDateKey
    ,ShipDateKey
    ,CustomerKey
    ,PromotionKey
    ,CurrencyKey
    ,SalesTerritoryKey
    ,SalesOrderNumber + @RevisionNumber
    ,SalesOrderLineNumber
    ,@RevisionNumber
    ,OrderQuantity
    ,UnitPrice
    ,ExtendedAmount
    ,UnitPriceDiscountPct
    ,DiscountAmount
    ,ProductStandardCost
    ,TotalProductCost
    ,SalesAmount
    ,TaxAmt
    ,Freight
    ,CarrierTrackingNumber
    ,CustomerPONumber
    ,OrderDate
    ,DueDate
    ,ShipDate
    FROM dbo.FactInternetSalesBig;
    UPDATE RevisionNumberValue SET RevisionNumber = RevisionNumber + 1;
    GO 9

    Make sure you have enough log space. My log grew from 1GB to just over 8GB when running the script to build the 30 million row table and columnstore index, with no clustered index.

    The downloadable script has commented options to build either a clustered or nonclustered index before you build the columnstore index. I suggest building the clustered index before the columnstore index, because building the clustered index will rebuild any existing nonclustered indexes, including my columnstore index. Building the columnstore index can take a while, on my system it was about 10 minutes. (Building the original 30+ million row table took even longer.)

    After I finish the chapter on partitioning, I'll have a script to partition this table, but feel free to do that yourself!

    Have fun!

    ~Kalen

  • Follow-up Answers for my Australia Classes

    I was out of the country for the last two weeks of March, delivering classes in Brisbane and Sydney, which were organized by WardyIT.  It was a great visit and there were 24 terrific students!

    As is sometimes (perhaps often?)  the case, there were questions posed that I couldn’t answer during class, so here are a couple of follow-up answers.

    1. I brought up the fact that SQLS 2012 generates a warning message when there are ‘too many’ Virtual Log Files (VLFs) in a database. (It turns out the message will be generated when a database has more than 10,000 VLFs but the error says that the database has more than 1000 VLFs! Which is of course, technically true, but the message is a little misleading.)  A student reported hearing that the Best Practices Analyzer also had a warning about the number of VLFs, but didn’t know what that limit was, and asked if I knew. I didn’t then, but I do  now. A message will be generated by BPA when there are 1000 or more VLFs. The message is actually focused on the performance problems that can arise when there are two many VLFs. The boot message is more concerned with the impact on database restore, rollback, backup and recovery of too many VLFs.

    2. When discussing locks, I briefly mention the BU lock that is acquired by request (with a hint) during bulk load operations, for the purpose of allowing multiple processes to load into the same table concurrently. Someone asked whether any SSIS components used the BU lock, and I thought it was probably true, but I since I don’t work with SSIS, I couldn’t comment any deeper. I was able to discover, thanks to SQL Server MVPs Paul White and Arthur Zubarev that it definitely is possible for both SQL Destinations and OLE DB Destinations. The destination table must be a heap, and you can specify TABLOCK as an option for the FastLoadOptions property, as shown:

    Kalen 

    3. I very briefly mention Extended Events on the very last day, and mention that in SQL Server 2008 one of the drawbacks is that there is no GUI option to set up an XEvents session, but that has changed in SQL Server 2012. I was asked if the 2012 Management Studio could connect to a SQL Server 2008 instance and set up an XEvents session on that instance. My friend and fellow MVP Ben Miller was able to verify that when connecting from the SQL 2012 GUI to a 2008 instance, the option for creating an XEvents session does not even show up.  You could try scripting a session created for a 2012 instance and running it on a 2008 instance, but there are no guarantees.

    So now I’ll get back to working on my new book …

    ~Kalen

  • Geek City: Growing Rows with Snapshot Isolation

    I just finished a wonderful week in Stockholm, teaching a class for Cornerstone Education. We had 19 SQL Server enthusiasts, all eager to find out everything they could about SQL Server Internals. One questions came up on Thursday that I wasn’t sure of the answer to. I jokingly told the student who asked it to consider it a homework exercise, but then I was so interested in the answer, I try to figure it out myself Thursday evening.  In this post, I’ll tell you what I did to try to answer the question.

    I am writing this on an airplane, flying from Frankfurt to San Francisco.

    When a database is enabled for either ALLOW_SNAPSHOT_ISOLATION or READ_COMMITTED_SNAPSHOT, SQL Server adds 14 bytes of overhead to each row. These 14 bytes contain the XSN (transaction sequence number) at the time the row was modified, and a pointer to the previous ‘version of the row (if there is one) which is stored in a part of the tempdb database called the version store. In class, we were discussing the fact that rows involved in row versioning have the 14 bytes added the first time the row is modified after one of the snapshot isolation levels is enabled.  If your pages are very full, sometimes this can mean that adding the 14 extra bytes will cause one or more of the affected rows to not fit on the original page, so the page may need to be split.

    The student asked if could could avoid the problem of adding the extra 14 bytes on the fly for each row if we rebuilt the table right after enabling one of the snapshot isolation levels. In another words, did rebuilding count as modifying the rows?

    I tested it by creating a copy of one of the small tables in the pubs sample database that you can download from Microsoft, here.  You, of course, could run a similar test of your own with any table.  I first made sure that both snapshot isolation level settings were off. No one can be in the database when you set read READ_COMMITTEE_SNAPSHOT on or off, and it’s best to make sure there are no current transactions when you set ALLOW_SNAPSHOT_ISOLATION on or off.

    ALTER DATABASE pubs SET read_committed_snapshot OFF;
    ALTER DATABASE pubs SET allow_snapshot_isolation OFF;
    GO

    Now make a copy of the authors table.

    USE pubs;
    GO
    IF exists (SELECT * FROM sys.tables WHERE name = 'SIauthors')
                DROP TABLE SIauthors;
    GO
    SELECT * INTO SIauthors from authors;
    GO

    Now I want to look at the new authors table to see what an actual row looks like on the data page. DBCC IND will show me which pages belong to the table.

    DBCC IND (pubs, SIauthors, -1);

    Sometimes I get two rows for data pages (PageType = 1) back, but only one of them has data.

    image

    DBCC PAGE, with traceflag 3604 on, will show me the rows on the data page.

    DBCC TRACEON(3604);
    DBCC PAGE(pubs, 1,245, 1)

    Before enabling snapshot isolation, the rows end with the the bytes for the name of the city. In the output below, for the first row on my page, the city is MENLO PARK. 

    image

    So let’s look at how things change if we enable one of the snapshot isolation levels and rebuild the table.

    -- Make sure no other connections are in the pubs database before running this ALTER. (The connection where you run the command can be in the database.  After changing the database, verify the setting.

    ALTER DATABASE pubs SET read_committed_snapshot ON;

    SELECT is_read_committed_snapshot_on
    FROM sys.databases
    WHERE name = 'pubs';

    Now rebuild the table and find out the new page number(s).

    ALTER TABLE SIauthors REBUILD;
    GO

    DBCC IND (pubs, SIauthors, -1);
    GO

    Because the table has been rebuilt, there will be new page numbers. Use DBCC PAGE to look at the new page, and you’ll see the 14 extra bytes at the end of the rows.

    image

    So I concluded that rebuilding the table would add the extra 14 bytes to each row and that’s what I told the class on Friday. However, as I started preparing to write this post, I decided to do a few more tests, one of which was building a clustered index on the table and then rebuilding the clustered index instead of the table itself. (Note that rebuilding a table is a new feature in SQL Server 2008.) However, you should know that the clustered index includes the table, so we’re still rebuilding the table when we rebuild the clustered index. 

    I found that an ALTER INDEX REBUILD, although it moved the table to new pages, did not add the 14 bytes.  Performing an ALTER TABLE REBUILD, if the table had a clustered index, also did not add the extra bytes. It only seemed to happen when the table is a heap.

    Obviously, more research is required. But I’ll leave that as a homework exercise for you!

    Have fun!

    ~Kalen

  • Did You Know? Query Options Setting in SSMS

    Hi folks

    I’m back from an amazing week at PASS and an awesome reception for SQL Server MVP Deep Dives Volume 2.

    I’m starting to plan and write my next book on SQL Server 2012 Internals so blogging will kept to a minimum, not that I’ve had a lot of time to blog lately.

    However, while working on the final quiz for my Concurrency class through SSWUG, I noticed something peculiar about the Query Options | Advanced dialog in Management Studio.  You have the option of choosing a default ISOLATION LEVEL and a DEADLOCK PRIORITY, and both those values have not been updated since before SQL Server 2005. The ISOLATION LEVEL choices do NOT include SNAPSHOT, which was introduced in SQL Server 2005 and the DEADLOCK_PRIORITY only shows Low and Normal. The value High and the numbers –10 through 10 were added back in SQL Server 2000!

    SNAGHTML1bb0c86

    So my first thought was that nobody had updated this dialog since then, but then I noticed that there was an option you can see under “Results” in the left pane, to give you Multiserver results, and Multiserver queries weren’t added until SQL Server 2008.

    So what do you think? Should Microsoft update this dialog to give you all the possibilities?

    ~Kalen

  • Did You Know? I’ll be teaching my Internals Class online!

    In my most recent commentary for SQL Server Magazine, online here:
    http://www.sqlmag.com/article/training-and-certification2/sql-server-training-options-140429, I talked about different options for making training content available online.

    As an addendum to that, I’d like to mention that one of my training partners (Benchmark Learning in Minneapolis) has decided to offer my complete 5-day deep dives Internals class online, in early November.  We’re doing it as a split week class also, so your brain can have a rest after the first two days.

    You can get the rest of the details by clicking on the link on my schedule page.

    http://schedule.kalendelaney.com

    Although I probably won’t do what Paul did in his series of remotely delivered Australian user group presentations, I probably will at least  be barefoot!

    ~Kalen

  • Geek City: Where are LOBs stored?

    When researching a question from one of the students in my class last week, I was reading the documentation for CREATE TABLE about storing LOB columns at http://msdn.microsoft.com/en-us/library/ms174979.aspx. For this discussion LOB columns includes text, image, ntext, xml and the MAX columns when they are over 8000 bytes and stored outside the regular data row. I knew that SQL Server gives us the capability of storing LOB columns in a separate filegroup with the TEXTIMAGE_ON clause, but I was surprised at one thing the docs mentioned.

    The documentation says:

    TEXTIMAGE_ON { filegroup | "default" }

    Are keywords that indicate that the text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined type columns are stored on the specified filegroup.

    TEXTIMAGE_ON is not allowed if there are no large value columns in the table. TEXTIMAGE_ON cannot be specified if <partition_scheme> is specified. If "default" is specified, or if TEXTIMAGE_ON is not specified at all, the large value columns are stored in the default filegroup. The storage of any large value column data specified in CREATE TABLE cannot be subsequently altered.

    The way I read this highlighted sentence is that if you don’t put the LOB data on a particular filegroup, it will always go on the default filegroup. The default filegroup has a specific meaning for SQL Server. It is the filegroup where all objects are placed if you don’t use the ON clause to put them on a specific filegroup. By default, the default filegroup is the PRIMARY filegroup, but with the ALTER DATABASE / MODIFY FILEGROUP command, you can change the default filegroup.

    I did not think that what this sentence was saying was true, so I decided to test it.

    I first created a new database with three filegroups. The PRIMARY filegroup is the default filegroup, because I am not changing the default to be anything else. The two additional filegroups are FG1 and FG2.

    USE master;

    GO

    IF EXISTS (SELECT * FROM sys.databases WHERE name = 'TestFGs')

    DROP DATABASE TestFGs;

    GO

    CREATE DATABASE TestFGs

    ON PRIMARY

    (NAME = Data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFGPrimary.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%),

    FILEGROUP FG1

    (NAME = FG1_data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\FG1_data.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%),

    FILEGROUP FG2

    (NAME = FG2_data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\FG2_data.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%)

    LOG ON

    (NAME = TestFG_log,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFG_log.ldf'

    , SIZE = 20 MB

    , MAXSIZE = 2 GB

    , FILEGROWTH = 25%);

    GO

    I then use the new database, and create three tables, each with the same column definitions. One of the column is a TEXT column which will be stored separately from the regular data rows unless I specify otherwise.

    The first table does not specify any filegroups so everything should go on the default filegroup. The second table specifies a filegroup for the table (FG1) but not for the LOB data. The third table specifies a filegroup for the table (FG1) and separate filegroup for the LOB data (FG2).

    To determine where the data has been stored, I examine the sys.allocation_units view. I need to join it to sys.partitions in order to get the corresponding object_id. The function object_name will give me the name of the table, and the function filegroup_name will translate the column data_space_id into a filegroup name.

    USE TestFGs

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows')

    DROP TABLE bigrows;

    GO

    CREATE TABLE bigrows

    (a char(1000),

    b varchar(8000),

    c text );

    GO

    INSERT INTO bigrows

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    go

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) = 'bigrows';

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows2')

    DROP TABLE bigrows2;

    GO

    CREATE TABLE bigrows2

    (a char(1000),

    b varchar(8000),

    c text )

    ON FG1;

    GO

    INSERT INTO bigrows2

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    go

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) LIKE 'bigrows%';

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows3')

    DROP TABLE bigrows3;

    GO

    CREATE TABLE bigrows3

    (a char(1000),

    b varchar(8000),

    c text )

    ON FG1 TEXTIMAGE_ON FG2;

    GO

    INSERT INTO bigrows3

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    GO

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) like 'bigrows%';

    And here are my results. The first table, bigrows1 has both its regular rows and its LOB data on the default filegroup, PRIMARY. You might also notice that an allocation unit for row-overflow data was created, even though we didn’t have any data that needed that space.

    The second table, bigrows2, has all its data on FG1 even though I just specified to put the table there. The LOB data is stored with the rest of the table, if I don’t specifically place the LOB data on a different filegroup.

    Only for the third table, bigrows3, is the LOB data stored separately, and only because I specifically used the TEXTIMAGE_ON clause when creating the table.

    clip_image001

    So now you know.

    ~Kalen

  • Geek City: What gets logged for SELECT INTO operations?

    Last week, I wrote about logging for index rebuild operations.  I wanted to publish the result of that testing as soon as I could, because that dealt with a specific question I was trying to answer. However, I actually started out my testing by looking at the logging that was done for a different operation, and ended up generating some new questions for myself.

    Before I started testing the index rebuilds, I thought I would just get warmed up by observing the logging for SELECT INTO. I thought I knew what got logged, but I was wrong. My understanding was that as a minimally logged operation, in FULL recovery, SELECT INTO would log each 8K page as it was written, not each row like a normal INSERT would. In BULK_LOGGED, SELECT INTO would create a log record for each page, but it would not log the entire 8K page. And of course, it would also not log every row.  However, my first test turned up some surprising results.

    Here's what I did: First, I made sure the AdventureWorks database was in FULL recovery, and dropped the Details table if it existed. Then I created  a table called SELECTINTOLoggingData to hold the results of calling fn_dblog.  As in last week's script, the following information was captured:

    • Max log record size
    • Sum of log record sizes
    • Number of log records
    • Number of log records larger than 8K

    Finally, I backed up the AdventureWorks database to give me a foundation for log backups. So here's the first part of the script:

    ALTER DATABASE AdventureWorks SET RECOVERY FULL
    GO
    USE AdventureWorks;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables    
                WHERE name = 'Details')
           DROP TABLE Details;
          
    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'SELECTINTOLoggingData')
        DROP TABLE SELECTINTOLoggingData;
    GO

    CREATE TABLE SELECTINTOLoggingData
    (DBRecModel CHAR(12), MaxLogRowSize INT, NumLogRows INT, NumLargeRows INT);
    GO

    BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
    GO

    The actual testing was be done first in FULL recovery, then in BULK_LOGGED. I backed up up the transaction log, then created the Details table by executing a SELECT INTO from the Sales.SalesOrderDetail table.  The Details table has  1495 pages and 121,317 rows. Finally, I inserted the relevant values from fn_dblog  into the SELECTINTOLoggingData table.

    ----  FULL Recovery


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*) FROM fn_dblog(null, null);
    GO
    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    INSERT INTO SELECTINTOLoggingData
    SELECT 'FULL',
            MAX([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
            FROM fn_dblog(null, null);
    GO

    DROP TABLE Details;
    GO


    -- BULK_LOGGED Recovery


    ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    INSERT INTO SELECTINTOLoggingData
    SELECT 'BULK_LOGGED',
            MAX([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
            FROM fn_dblog(null, null);
    GO

    Finally, I looked at the data I collected:

    SELECT * FROM SELECTINTOLoggingData;
    GO

    Here are my results:

    image

    Although the number of large (8K) log rows was as expected, I was quite surprised to see the total number of log records! (And yes, I know the column names use the term "Log Rows", but I'm calling them "Log Records".  I guess I just wanted a shorter column name.)

    I then ran the same test on a different version of the Sales.SalesOrderDetail table that I use in some of my class examples. Here is the script to create the Sales.SalesOrderDetail2 table:

    CREATE TABLE Sales.SalesOrderDetail2 (
        [SalesOrderID] [int] NOT NULL,
        [SalesOrderDetailID] [int] NOT NULL,
        [CarrierTrackingNumber] [nvarchar](25) NULL,
        [OrderQty] [smallint] NOT NULL,
        [ProductID] [int] NOT NULL,
        [SpecialOfferID] [int] NOT NULL,
        [UnitPrice] [money] NOT NULL,
        [UnitPriceDiscount] [money] NOT NULL ,
    CONSTRAINT [PK_SalesOrderDetail2_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED
    (
        [SalesOrderID] ASC,
        [SalesOrderDetailID] ASC
    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY];
    GO

    INSERT INTO Sales.SalesOrderDetail2 SELECT [SalesOrderID]
          ,[SalesOrderDetailID]
          ,[CarrierTrackingNumber]
          ,[OrderQty]
          ,[ProductID]
          ,[SpecialOfferID]
          ,[UnitPrice]
          ,[UnitPriceDiscount]
      FROM AdventureWorks.Sales.SalesOrderDetail;
    GO

    After setting the AdventureWorks database back to FULL recovery, deleting the Details table and truncating the SELECTINTOLoggingData table,  I run the same test as above (starting at the "Full Recovery" comment, substituting Sales.SalesOrderDetail2 for Sales.SalesOrderDetail.

    ALTER DATABASE AdventureWorks SET RECOVERY FULL;
    GO
    DROP TABLE Details;
    GO
    TRUNCATE TABLE SELECTINTOLoggingData
    ;
    GO

    Here are my new results:

    image

    So these are much more like the results I expected from a SELECT INTO operation!

    Why did the first example, selecting from Sales.SalesOrderDetail, do so much more logging?

    It turns out that if you have a IDENTITY column in the source table, SELECT INTO must then log every row as the IDENTITY value is generated.

    Does this mean you should rethink the beneift of using IDENTITY columns? Only if you're doing a lot of SELECT INTO operations to copy a table with IDENTITY columns. Normal INSERT statements, which is how IDENTITY values are usually generated, are going to have the same amount of logging whether or not the table has an IDENTITY column.  (Let me know if you think otherwise.)

    So now you know.

    ~Kalen

    p.s. I would like to thank Sunil Agarwal for his brilliant insights and encouragement. Thanks for everything, Sunil!

  • Geek City: What gets logged for index rebuild operations?

    This blog post was inspired by a question from a future student. Someone who was already booked for my SQL Server Internals class in June asked for some information on a current problem he was having with transaction log writes causing excessive wait times during index rebuild operations when run in ONLINE mode. He wanted to know if switching to BULK_LOGGED recovery could help.

    I knew the difference between ALTER INDEX in FULL vs BULK_LOGGED recovery when doing normal OFFLINE rebuilds, but I wasn't sure about ONLINE. So I dug out some old scripts, massaged them a bit, and ran some tests.

    A minimally logged operation is one that does not always log every single row. In FULL mode, SQL Server will log the entire page as each page is filled, so that we end up with individual  log records of over 8K. In BULK_LOGGED mode, SQL Server just logs the fact that a new page was generated (and some information about the page) and doesn't log the full 8K. So the log space used is quite a bit less.

    I will use Microsoft's AdventureWorks database that you can download from codeplex. I used the SQL Server 2005 version.

    After making sure the database is set to FULL recovery, I make a copy of the Sales.SalesOrderDetail table (which I call Details) and build a clustered index on it.  The table has about 1580 pages and 121,317 rows.


    ALTER DATABASE AdventureWorks SET RECOVERY FULL
    GO
    USE AdventureWorks;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables    
                WHERE name = 'Details')
           DROP TABLE Details;
          
     
    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    CREATE CLUSTERED INDEX CL_details_indx ON Details(SalesOrderID);
    GO

    I then build a table to store log information that I will gather after the different index rebuild operations. After that I back up the database to make sure I have a base for making log backups.

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'AlterIndexLoggingData')
        DROP TABLE AlterIndexLoggingData;
    GO

    CREATE TABLE AlterIndexLoggingData
    (DBRecModel CHAR(12), IndexBuildMode CHAR(3), MaxLogRowSize INT, SumLogSize BIGINT, NumLogRows INT, NumLargeRows INT);
    GO
    BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
    GO

    Now starts the testing. There are four sections, each of which basically does the same thing, with the four permutations of FULL vs BULK_LOGGED recovery, and ONLINE vs OFFLINE index builds.  Each section backs up the transaction log to truncate it, and then reports the number of rows in the log, just to verify we really have (logically) cleared it out. It then performs a rebuild of the index on the Details table. Finally, it captures the following information from the log using fn_dblog:

    • Max log record size
    • Sum of log record sizes
    • Number of log records
    • Number of log records larger than 8K

    ----  FULL Recovery, ONLINE = OFF

    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*) FROM fn_dblog(null, null);
    GO
    ALTER INDEX  CL_details_indx  ON Details REBUILD
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'FULL', 'OFF',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
     
     
    --- FULL Recovery, ONLINE = ON


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD
    WITH (ONLINE = ON)
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'FULL', 'ON',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    --SELECT * FROM AlterIndexLoggingData;
    GO

    --- BULK_LOGGED Recovery; ONLINE = OFF


    ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'BULK_LOGGED', 'OFF',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    --SELECT * FROM AlterIndexLoggingData;
    GO

    --- BULK_LOGGED Recovery, ONLINE = ON


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD WITH (ONLINE = ON)
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'BULK_LOGGED', 'ON',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    SELECT * FROM AlterIndexLoggingData;
    GO

    Here are my results:

    image

    Notice the following:

    • Only FULL recovery with an OFFLINE rebuild logged the 8K pages.
    • FULL recovery with an ONLINE rebuild actually did log every single row; in fact, it looks like at least two log records were generated for each row. There is a LOT of logging for this set of options and by far, the largest amount of log space used.
    • BULK_LOGGED mode used substantially less log space than FULL, but the difference between ONLINE and OFFLINE is even more pronounced. In FULL recovery, ONLINE used about twice the log space, in BULK_LOGGED, the logging required more than 8 times the space!

    I'm sure there are more fun facts you can pull out of the results from fn_dblog, but for my first look at the logging differences between ONLINE and OFFLINE index rebuilds, this is what I wanted to know. It's not a complete answer to the student's question, but it does indicate there is a lot more logging with ONLINE index builds, so it should not be surprising if there are a lot more log write waits. And I would assume that switch to BULK_LOGGED recovery could make a noticeable difference.

    I hope this is useful!

    ~Kalen

  • Geek City: Preparing for the SQL Server Master Exam

    I was amazed at the results when I just did a search of SQLBlog, and realized no one had really blogged here about the changes to the Microsoft Certified Master (MCM) program. Greg Low described the MCM program when he decided to pursue the MCM at the end of 2008, but two years later, at the end of 2010, Microsoft completely changed the requirements.

     Microsoft published the new requirements here. The three week intensive course is no longer required, but that doesn't mean you can just buy an exam cram guide. Experience with SQL Server is absolutely required to pass the rigorous knowledge exam and hands-on lab exam. When you see all the recommended preparation, you might wish there was still a way to get all the preparation done in only three weeks. The Microsoft site lists many resources you can use to prepare, including books, web videos and instructor led training. My SQL Server 2008 Internals book is at the top of the book list, and my 5-day SQL Server Internals and Tuning course is listed as a suggested training resource.

    My next delivery is in about 6 weeks, to be taught in Boston, just before the Boston Marathon! The class is organized by SQLBlog's own Adam Machanic and his Boston SQL Training,  and you can get more information and register here.

    See you in class!

    ~Kalen

  • Did You Know: Where's the Best Place to Get Started?

    Have you ever wondered how to start learning about a specific SQL Server topic? Or how to advise someone who asks "How do I get started?"

    Redgate, and SQL Server Central, have launched a new site to answer that question. 

    SQL Server Stairways are a series of tutorials designed to take you from zero knowledge of a particular SQL Server topic, to a level of practical understanding that will allow you to start using that feature in a production environment.

    Not all the tutorials are finished, but new articles are being added regularly. The following Stairways are already on the site:

      • SQL Server Agent
      • Server-side Tracing
      • SQL Server Indexes
      • Transactional and Merge Replication
      • Database Design
      • MDX
      • Integration Services
      • Reporting Services
      • StreamInsight

    3dStairway200x100

    Have fun!

    ~Kalen

  • Did You Know: I'm going back to Boston!

    After spending 5 months in Boston last summer and fall, I'm going back again, to teach my 5-day SQL Server Internals and Query Tuning class for Boston SQL Training. Last time I taught my course there, we had a completely full class with a waiting list. So I'm going to offer the course again!

    The class will be delivered April 11- 15 but early bird registration ends next Tuesday, February 1st.

    I'm very excited to go back, and thankful to Adam Machanic of Boston SQL Training for this opportunity.

    You can read what Adam says about the course, including some student comments, in Adam's post.

    Thanks!

    ~Kalen

  • Geek City: Ordered Seeks and Scans

    I got a couple of really great questions during my SSWUG Workshop this morning, as I was discussing seeks and scans, and since the answers to the two questions are very related, I decided to address both of them in more detail in this blog post.

    Most people think about a seek operation as retrieving just one or a few rows, and a scan as accessing the entire table  or index leaf level.  And that usually is what I mean when I am just describing SQL Server behavior. But there is also the possibility that SQL Server could perform a 'partial scan', starting at some row in the table or index leaf, and then scanning only to a particular ending row. However, there is no operator in a query plan called 'partial scan'.  So how can you tell when you have one?

    Before I specifically answer that, let me tell you about the first question in the workshop this morning. I had been showing the Properties window (available from View/Properties Window), pointing out that when you click on one of the icons in a graphical query plan, the Properties window shows the same information that is in the yellow box that pops up, plus additional information. One of the values shown in one called 'Ordered', and when 'Ordered' has the value true, then there is also a value called 'Scan Direction', which might make you think that ordering is only relevant to scans.  But you will see True values for 'Ordered' even when the operation is a seek. This is usually an indication that a partial scan is being performed.

    And because I was talking about heaps being unordered sets of rows, one of the delegates asked: "Are index seeks against a heap always unordered?"

    The answer is, of course, "it depends".  If you have a covered query, the index access could be an ordered seek of the covering index, even though the underlying table is a heap. Here is an example from the AdventureWorks database. I make a copy of the Sales.SalesOrderHeader table, so it is a heap, and then build a nonclustered index on the Freight column.

    IF EXISTS (SELECT * FROM dbo.SalesOrderHeader)
       DROP TABLE dbo.SalesOrderHeader;
    GO
    SELECT * INTO dbo.SalesOrderHeader
    FROM Sales.SalesOrderHeader;
    GO
    CREATE INDEX ix_Freight ON dbo.SalesOrderHeader(Freight);
    GO

    Then I look at the plan for a covered query using the ix_Freight index:

    SELECT AVG(Freight)
    FROM dbo.SalesOrderHeader
    WHERE Freight < 100;

    And here is the plan showing the index seek:

    image

    And the Properties sheet for the Index Seek shows that Ordered - True and Scan Direction = FORWARD.

    image

    This is an example of an ordered seek on a nonclustered index, or what we can think of as a partial scan, or range scan.

    The other question during the workshop was how to tell if the seek was really a range scan. Part of the answer is to look for the Ordered = True value, as I just illustrated. Another thing to notice in the graphical query plan is the thickness of the line moving rows to the left from the Index Seek, indicating that LOTS of rows are being returned from the Seek. That's another clue that we have a partial scan.

    We can also get partial scans with clustered indexes. I'll build a clustered index on my new table.

    CREATE CLUSTERED INDEX clus_OrderID ON dbo.SalesOrderHeader(SalesOrderID);
    GO

    There are 31465 rows in the dbo.SalesOrderHeader table, with SalesOrderID values ranging from 43659 to 75123. I'll SELECT a small range of data:

    SELECT * FROM dbo.SalesOrderHeader
    WHERE SalesOrderID < 44000

    The plan shows a Clustered Index Seek, and the Properties shows Ordered = True. 

    If I make the range bigger, and look for SalesOrderID < 60000, I still get an Index Seek, and the same when using an upper limit of 75000. If I run a query that returns ALL the rows, I still get an Index Seek.

    image

    Both queries are using an Index Seek, but notice the difference in the thickness of the arrows.

    It turns out that the only time the optimizer will tell us it is doing a scan is when it knows absolutely that it will be looking at every row in the table or index leaf. And the only way it can know this is if there is no filter condition (NO WHERE clause) or no index on the filter condition. With the index on SalesOrderID, there is no way the optimizer can be absolutely sure that it will have to scan the whole table. Someone might have inserted a value of 100001 since the last time statistics were updated, and we wouldn't want to perform a scan and return a value that is not in the specified range. So the Index Seek in the plan really just means that SQL Server will access the data using the index, but not with a complete scan. There is something in the query that give a starting point, a stopping point or both.

    I've got another Workshop for SSWUG on Index Internals, coming up on February 25th. Hopefully, I'll get a lot more great questions during that one!

    Thanks!

    ~Kalen

  • Did You Know? My PASS Demos and Social Networking

    My PASS demos are now available on my website for my Precon Session on SQL Server Concurrency: Locking and Blocking and Row Versions, Oh My!  and for my Spotlight Session on Seeking SQL Server Secrets. You can download both sets of scripts from

    www.SQLServerInternals.com/conferences

    Also, I set up a FaceBook fan page a few weeks ago, for SQL Server technical information, news, tips and tricks... the Facebook page name is SQL Server Internals, and the direct URL is http://www.facebook.com/SQLInternals. Note that on my personal FaceBook page I do not 'friend' people that I don't know, but I don't post any technical info on my personal page. If you want technical info from me on FaceBook, just 'Like' the SQL Server Internals page.

    Finally, I started tweeting, right after the PASS conference ended. You can look for @SQLQueen

    Enjoy!

    ~Kalen

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement