THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

  • Geek City: A Grammar Geek in the Cloud

    Those of you who know me well know that I am usually a stickler for spelling, grammar and proper word usage. I may have even lost a few friends because of what some people see as obsession. I am not infallible, and I do make typos, but I like to try to correct them if possible as soon as I discover them (or as soon as they are brought to my attention.)

    So now I will admit that I made a mistake in my usage of of the word ‘premise’, when talking about Cloud vs. non-Cloud databases. I was using the term on-premise as the opposite of ‘in the Cloud’, and my friend Cindy corrected me last week in no uncertain terms. So I went online and quickly found this post that proved she was right and I was wrong:

    http://fbhalper.wordpress.com/2009/06/11/premise-vs-premises-in-the-cloud/ 

    I did a bit more searching and found that I am not the only one making this mistake. A Research VP at Gartner actually makes the same mistake in a published post:

    http://blogs.gartner.com/chris-wolf/2010/07/14/on-premise-microsoft-azure-an-inevitable-milestone-in-azure%e2%80%99s-evolution/

    I will admit that I find the word ‘on-premises’ awkward to say, so when I’m talking casually I might end up leaving off the last syllable.

    But anytime I’m writing, I promise to use premise vs. premises appropriately.

    And I expect the same from everyone else.

    Smile

    ~Kalen

  • Geek City: Join With Me!

    I remember one of the most surprising changes in SQL Server 2000 2005 was how the graphical plans showed the use of a nonclustered index to seek, and the plan included something that looked like a JOIN to find rows in the base table.  Here’s an example. Although I used SQL Server 2008R2, the graphical plan will be pretty similar to what it looked like back in SQL Server 2000 2005. My code will make a copy of a table in the AdventureWorks2008 database, and then build an index on one of the columns.

    USE AdventureWorks2008;
    GO
    IF object_id('dbo.Sales') IS NOT NULL
        DROP TABLE dbo.Sales;
    GO
    SELECT * INTO dbo.Sales FROM Sales.SalesOrderHeader;
    GO
    CREATE INDEX Sales_SalesPersonID_index on dbo.Sales(SalesPersonID);
    GO

    Now look at the estimated graphical execution plan for this query, that searches for a particular value for the SalesPersonID column:

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 280;
    GO

    You should see something like this:

    image

    It shows a JOIN even though there is only one table involved! What’s with that? It took me a few minutes to figure out what I was seeing, and I must admit that over the years since then I have come to appreciate this graphical representation. You’ll have a problem if you think of a JOIN as only finding matching rows between two tables, but can be solved if you realize that a JOIN can be used when finding matches between any two ROWSETS, i.e. any two sets of rows. There is a set of rows in the nonclustered index on SalesPersonID, and a set of rows in the table. The index seek finds all the index rows with a SalesPersonID value of 280, but those index rows only contain an index key value and a row pointer (RowID or RID). Because the query is requesting all the columns in the table, not just the SalesPersonID value,  SQL Server must find the rows in the dbo.Sales table that have a matching value for the RowID. Internally, it is doing a join, even though there is only one table. It is joining index rows with table rows.

    If the table had a clustered index, the plan would look almost exactly the same, with one difference. In SQL Server, if a table has a clustered index, nonclustered index rows do not contain RowIDs, they contain the clustered key value to be used a pointer to the row. SQL Server can take this clustered key from the nonclustered index row and look up in the clustered index to find the desired row. So I create a clustered index on SalesOrderNumber:

    CREATE UNIQUE CLUSTERED INDEX Sales_ident on dbo.Sales(SalesOrderNumber);
    GO

    Now when I run the same query as above, I get the following plan:

    image

    The only difference in the two plans above is the icon for the lookup into the base table.  One is looking up into a heap using a RowID, which is an address containing the File ID, the page number and the slot or row number on the page. The other icon represents taking a key value from the nonclustered index row and looking it up by seeking through the clustered index. If you look again at these two icons, which one looks like a picture of directly addressing a row, and which looks like a picture of following another index into the table itself?

    image

    The icon on the left is used for a RID lookup, and the icon on the right is used for a clustered index key lookup. I know that nobody asked me, but I think they got these backwards.

    But wait, there’s more…

    Prior to SQL Server 7, there was basically a rule of thumb with query plans that SQL Server would use one index (at most) per table per query. It could never (well, hardly ever) use two indexes on the same table in a single query. That changed in SQL Server 7 when a new algorithm called Index Intersection was introduced.

    I’m going to build another index on SalesOrderDate:

    CREATE INDEX Sales_OrderDate_index on dbo.Sales(OrderDate);
    GO

    This index will be used if I look for all orders place on July 1, 2002. The query plan for the following query is shown:


    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01';

    image

    We saw that the index on SalesPersonID is useful when looking for the value 280, and the index on SalesOrderDate is useful when looking for '2002-07-01'. But what if one query looks for rows that satisfy both conditions?

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01'
    AND SalesPersonID = 280

    Here’s the plan:

    image

    We see two JOINs ! First, SQL Server finds the sets of index rows meeting each of the WHERE filter conditions and those two index row sets are joined together, using the clustered key value as the join column. Then only after SQL Server finds rows that meets both conditions, it takes the common clustered index key and uses it to find the row in the underlying table.

    But wait, there’s more!

    Both of these queries use a table scan, because the filter conditions are not selective enough. There are way more rows with a SalesPersonID value of 289 than there were for a value of 280, and there are more rows with an OrderDate of March 1, 2004 than there are with an OrderDate of July 1, 2002.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01' ;
    GO

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 289;
    GO

    Both of the above queries show the same plan. Because the table now has a clustered index, the plan shows a clustered index scan, but it is exactly the same work as a table scan:

    image

    However, if you look at the plan for a query that uses both filter conditions, we don’t get a clustered index scan.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01'
    AND SalesPersonID = 289;

    image

    Even though each filter alone is not very restrictive, they can both be used for index intersection. The reason SQL Server usually doesn’t choose index seeks on non-restrictive filters is because the cost of then accessing the base table for all the qualifying rows can be very high. But even though each individual index returns a lot of rows, the optimizer’s estimate of the number of rows that meet BOTH conditions is very small, and therefore the number of lookups into the base table is quite small and affordable.

    So, what can you learn? Just because you think an column might not be selective enough to build a nonclustered index on, don’t discount it.  If that column will frequently be used in queries with multiple conditions, it just might turn out to be more useful than you thought!

    So make sure you test many possible choices of indexes and combinations of indexes before you know what the best choices are.

    Have fun!

    ~Kalen

  • Geek City: Accessing Distribution Statistics

    Distribution statistics are one of the most important sources of information that the Query Optimizer uses to determine a good query plan. In this post, I’m not going to tell you everything about distribution statistics. I’m just going to show you a few tricks for getting access to the statistics.

    If you want a deeper understanding of what the statistics keep track of, and you don’t have any of my SQL Server Internals books handy, check out this whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 

    Microsoft does provide us a tool called DBCC SHOW_STATISTICS for examining the distribution statistics.

    Microsoft has gradually been making more of the more of the old DBCC commands available as DMVs, even some undocumented ones. For example, one of my favorites, DBCC IND, has now been replaced in SQL Server 2012  by sys.dm_db_database_page_allocations.

    I have been wishing for several versions that Microsoft would make the DBCC SHOW_STATISTICS information available as a DMV. But it hasn’t happened yet, and I’m tired of waiting, so I decided to do something about it.

    My solution is not quite as easy to use as a DMV might be, but it allows you to get the information that DBCC SHOW_STATISTICS provides into a set of three tables that can then be saved into a more permanent location of your choice, and/or queried as desired.

    DBCC SHOW_STATISTICS returns three sets of information, with different columns in the output, so three different tables are needed. The DBCC SHOW_STATISTICS command can be called with an argument that specifies that you just want one of the three sets returned. The options are

    WITH STAT_HEADER – returns basic info such as last update date, and number of rows in the table/index. Also reports number of steps returned for HISTOGRAM section.

    WITH DENSITY_VECTOR – returns density info for each left-based subset of columns in the index. For example, an index on (lastname, firstname, city) would have a density value for (lastname), for (lastname, firstname), and for (lastname, firstname, city). Each density value is a single number representing the average number of occurrences and depends on the number of distinct values. For example, if there are only 2 possible values in the column, the density would be 0.5. Multiplying density by the number of rows in the STAT_HEADER section would give the average expected rowcount if a query was executed looking for an equality on the specified column(s).

    WITH HISTOGRAM – returns a set of ordered values from the first column of the index, creating a histogram. This histogram provides the optimizer with selectivity information for specific values or ranges of values in the first column of the index.

    To collect this info, I will use one of my favorite tricks, which is to create a table in the master database with a name starting with sp_. (I’ve written about this trick several times, including in this earlier blog post.) Once I have the table(s) created, I can access them from any database. So here are the three tables:

    USE Master;
    GO

    IF  (SELECT object_id('sp_stat_header')) IS NOT NULL
      DROP TABLE sp_statsheader;
    GO
    CREATE TABLE sp_stat_header
    (   Name sysname,
        Updated datetime,
        Rows bigint,
        Rows_sampled bigint,
        Steps smallint,
        Density numeric (10,9),
        Average_key_length smallint,
        String_index char(3),
        Filter_expression nvarchar(1000),
        Unfiltered_rows bigint);
    GO


    IF  (SELECT object_id('sp_density_vector')) IS NOT NULL
      DROP TABLE sp_density_vector;
    GO
    CREATE TABLE sp_density_vector
    (  all_density numeric(10,8),
       average_length smallint,
       columns nvarchar(2126) );
    GO

    IF  (SELECT object_id('sp_histogram')) IS NOT NULL
      DROP TABLE sp_histogram;
    GO
    CREATE TABLE sp_histogram
    (   RANGE_HI_KEY sql_variant,
        RANGE_ROWS bigint,
        EQ_ROWS bigint,
        DISTINCT_RANGE_ROWS bigint,
        AVG_RANGE_ROWS bigint);
    GO

    The second trick is to use INSERT … EXEC to execute a DBCC statement and populate the tables. I will build the DBCC  command dynamically, after capturing the schema, table and index names in variables. You of course could take this code and turn it into a stored procedure, for which the schema, table and index names are passed as parameters. I’ll use as an example a table in the AdventureWorks2008 sample database, just so you can try running the code, and I can verify that it actually works!

    I will use the table Sales.SalesOrderDetail and the index IX_SalesOrderDetail_ProductID. So the object name (@oname) is SalesOrderDetail, the schema name (@sname) is Sales, and the index name (@iname) is IX_SalesOrderDetail_ProductID.

    SET NOCOUNT ON;
    USE AdventureWorks2008;
    GO
    DECLARE @oname sysname,  @iname sysname, @sname sysname

    SELECT @oname = 'SalesOrderDetail',  @sname = 'Sales', @iname = 'IX_SalesOrderDetail_ProductID';
     
    -- Update the object name to include the schema name, because that is the format the DBCC command expects
    SELECT @oname = @sname +'.' + @oname;

    TRUNCATE TABLE sp_stat_header;
    INSERT INTO sp_stat_header
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH STAT_HEADER');

    TRUNCATE TABLE sp_density_vector;
    INSERT INTO sp_density_vector
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH DENSITY_VECTOR');

    TRUNCATE TABLE sp_histogram;
    INSERT INTO sp_histogram
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH HISTOGRAM');

    So now you can look at the values collected and filter or query in any way, or use SELECT INTO to save them into another table, so the sp_ tables can be used the next time you want to capture distribution statistics information.

    SELECT * FROM sp_stat_header;

    SELECT * FROM sp_density_vector;

    SELECT * FROM sp_histogram;

      

    Let me know if you find this useful, and especially if you embellish it to create a procedure or an automated process of your own!

    Thanks!

    ~Kalen

  • Geek City: What Triggered This Post?

    I’d really like to get another post up onto my much neglected blog before the end of 2012. This will also start one of my New Year’s resolutions, which is to write at least one blog post a month. I’m going to tell you about a change in SQL Server that wasn’t announced in any “What’s New” list that I ever saw, perhaps because it was just a chance in internal behavior, and nothing that required any change in user applications.

    Do you retest what you know is true for every new version? When I update my books, I do test all the scripts, but if there isn’t a script, I don’t retest every ‘fact’ that I have known for years is true. And sometimes, things change. And sometimes my reviewers notice those unreported changes, and sometimes they don’t.

    You might be aware of the fact that SQL Server can perform UPDATE operations in two different ways. The UPDATE can be performed as a two-step process: delete the old row and then insert a whole new row, or, the UPDATE can be performed (much more efficiently) as an update-in-place.  When the two-step UPDATE is performed, it is a LOT more work. Not only does SQL Server have to log the entire old row and the entire new row, but each nonclustered index is also modified twice, and each of those index changes also has to be logged. So it’s nice when an update-in-place is done, because only the bytes changed are logged, and only indexes on the updated columns are affected.

    Prior to SQL Server 7, there were actually four different ways that UPDATE could be done. The two-step UPDATE had some variations that could make it even slower in some cases! But that was a long time ago, so I’m not going to go into the details now. But I will say that back then, in order to get an update-in-place to occur, there was a big long list of prerequisites that had to be met and if you missed just one, you’d get one of the slower UPDATE operations.

    As of SQL Server 7, update-in-place became the default. The only time it doesn’t happen is when the row can’t stay in the same location (such as when you update a clustered index key column) or when SQL Server really needs the old and new versions of the row.

    In SQL 7, one of the places that SQL needed the old and new version of the updates rows was when processing triggers. Triggers need the transaction log to get the contents for the DELETED and INSERTED pseudo-tables. And because triggers needed the entire old and new versions of the updated rows, the UPDATE was performed as a two-step operation. DELETE the old row, log the entire old row, and the INSERT the new row with the new values, and log the entire new row.

    But as of 2005, we now have the version store, primarily used for SNAPSHOT isolation, but available for other uses as well. In SNAPSHOT isolation, the version stores stores ‘old versions’ of rows that have been updated or deleted.  I knew that the version store was also used for triggers, but it only occurred to me just recently that maybe, because the old and new versions of the row were not needed from the log, perhaps UPDATEs did not always need to be performed internally as a two-step UPDATE.

    So I decided to test it out.

    -- DEMO: If there is an UPDATE trigger, are updates logged as DELETE + INSERT?
    -- First build a new database.

    USE master;
    GO
    IF (SELECT db_id('TestTrigger')) IS NOT NULL
        DROP DATABASE TestTrigger;
    GO
    CREATE DATABASE TestTrigger;
    GO
    ALTER DATABASE TestTrigger SET RECOVERY SIMPLE;
    GO
    SELECT db_id('TestTrigger');
    GO

    USE TestTrigger;
    GO

    -- Just for a warmup, look at the function fn_dblog, which works in the current database

    SELECT * FROM fn_dblog(null, null);
    GO

    -- Create a new table to work with
    IF (SELECT object_id('objects')) IS NOT NULL
        DROP TABLE objects;
    GO
    SELECT TOP 100 * INTO objects FROM sys.objects;
    GO

    -- Create a clustered index on the table
    CREATE CLUSTERED INDEX objects_clustered on objects(name);
    GO

    -- First examine an update we know is NOT done in place,
    -- i.e. updating a clustered key value

    UPDATE objects SET name = 'newrowsets' WHERE name = 'sysrowsets';
    GO

    -- Look at last 10 rows; notice a LOP_DELETE_ROWS and LOP_INSERT_ROWS
    -- The AllocUniteName column shows the object affected is the clustered index on dbo.objects
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);
    GO

    -- Now examine an update we know is  done in place,
    -- i.e. updating an unindexed column on a table with no triggers
    UPDATE objects SET parent_object_id = 1 WHERE name = 'sysfiles1';
    GO

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW on the dbo.objects allocation unit
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);
    GO

    -- Create an update trigger
    -- Will the update be done with the siple LOP_MODIFY_ROW or with the LOP_DELETE_ROWS and LOP_INSERT_ROWS
    CREATE TRIGGER trg_update_objects ON objects FOR UPDATE
    as
    SELECT * FROM DELETED; SELECT * FROM INSERTED;
    RETURN;
    GO

    -- Now perform update again
    UPDATE objects SET parent_object_id = 10 WHERE name = 'sysfiles1';
    GO

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW
    SELECT * FROM fn_dblog(null, null);
    GO

    Since the database is in SIMPLE recovery model, you can issue a CHECKPOINT before each UPDATE if you want to reduce the number of rows in the log to make it easier to examine.

    So it seems that I need to update my course and some of my writings. There might also be special cases that still require that an two-step UPDATE be performed in the presence of triggers, but it seems like a two-step UPDATE is not ALWAYS required anymore. That is very good news!

    I hope you all have a wonder-filled and joyous New Year!

    ~Kalen

  • Did You Know? I’m delivering my first SQL Server 2012 class this week!

    This  is actually just a very short post to get my name back up in the list on the right side of SQLBlog, because Adam’s configuration removes people who haven’t posted in 3 months. I’ve been extremely busy trying to get my  new book finished, and then get my course updated for its first delivery this week. I’ll be teaching it again in Norway at the end of the month. 

    It’s a great class this week, and a couple of questions have inspired longer posts, which hopefully I be able to get to next week.

    Hopefully I’ll also have an update on the timeline for the new book by next week.

    My schedule is always available if you are interested in learning SQL Server 2012 Internals with me!

    http://schedule.KalenDelaney.com

    Thanks

    ~Kalen

  • Did You Know? Turning Off Locking

    Back when the lock manager was rewritten for SQL Server 7, and row level locking was added to the product, the procedure sp_indexoption gave us the ability to turn OFF either row or page locks or both, for one particular index, or for a table and all the indexes on it. I must admit, I rarely found this option to be the solution to any sticky problems, although while doing some quick research prior to writing this post, I did find that Microsoft recommended turning off locking options on a couple of tables  when working with SAP:  http://msdn.microsoft.com/en-us/library/aa226172(v=sql.70).aspx

    In my experience, it is MUCH more common to have people turn off row or page locks when they shouldn’t, and by inspecting the sys.indexes catalog view, you can tell if that has happened.  (Prior to SQL Server 2005,  you could use the INDEXPROPERTY function.) 

    I recently got email from a reader of my books who pointed out to me an error I had made in my SQL Server 2005 book, where I had claimed that sp_indexoption could not be used to disallow row or page locks on heaps. If you check the BOL, it clearly says that you can supply either an index name or a table name to sp_indexoption, so that was just an oversight in my writing. 

    But then along comes ALTER INDEX, and the BOL now says you should avoid using sp_indexoption, as it will be removed in a future release, and you should use ALTER INDEX instead. So the reader sent me a follow-up question, asking how can we disallow row or page locking on a heap using ALTER INDEX. 

    It turns out that we can specify the ALL option instead of an index name when using ALTER INDEX, and this has the same effect as using sp_indexoption on a heap.  You can see it in this short example.

    First create a heap with a nonclustered index:

    use tempdb;
    GO
    IF object_id('testlocks') IS NOT NULL
       DROP TABLE testlocks;
    GO

    CREATE TABLE testlocks
    ( a int);
    CREATE INDEX testlocks_index ON testlocks(a);
    GO

    Now look at the lock properties:

    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    Results:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- ---------------
    1               1                HEAP
    1               1                NONCLUSTERED

    As shown, the default is that both row and page locks are allowed.

    Now use the old sp_indexoption to disallow row locks and then check the lock properties again:

    EXEC sp_indexoption testlocks, disallowrowlocks, 1;
    GO
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    Results:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    0               1                HEAP
    0               1                NONCLUSTERED

    Notice that sp_indexoption turned off row locking for both the heap and the nonclustered index.

    Now use ALTER INDEX to allow row locks and inspect again:

    ALTER INDEX ALL ON testlocks
    SET  (ALLOW_ROW_LOCKS = ON );
    GO
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    RESULTS:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    1               1                HEAP
    1               1                NONCLUSTERED

    Keep in mind that I’m not recommending that you turn off either row or page locking. And if you decide to change the types of locks allowed, make sure you test your application thoroughly under your maximum expected load to make sure you haven’t made things worse. And remember that you can’t turn off TABLE locking, only row and page locking.

    Have fun!

    ~Kalen

  • Geek City: Build a Big Table with a Columnstore Index

    I was looking all over to find a big table to use for my columnstore examples in my SQL Server 2012 book, and the only one I found was from someone at Microsoft and was not publicly available. When I demonstrate code in my writings, I like to have the code available for anyone to use, so that was no use.

    Finally I realized I was just going to have to do it myself. I actually based the script on some work by the awesome Benjamin Nevarez (blog| twitter), but I needed to make a couple of changes to the table structure, and then the loop for adding the data took some tweeking to make sure that uniqueness was possible, in case you need to test out how the metadata changes when a columnstore index is built on a table with a unique clustered index compared to a nonunique clustered index compared to a heap.

    I have just finished the chapter on indexes, and decided to make this script available. The initial table data is based on the data in Microsoft AdventureWorksDW2012 sample database that you can download here. (Note that the DW version is the first one under “Other Available Downloads”’; it’s not the one under “Recommended Download”. )

    Here is the section of the script that populates most of the FactInternetSalesBIG table. (The original 60K rows were just copied from the original FactInternetSales table. Note that all the statements have to be run in a single batch because the local variable’s scope is the batch, and because the GO 9 applies to the single batch that precedes it. Of course, you can change the number of iterations to end up with a different size table. Mine ends up at just over 30 million rows. In addition to modifying the SalesOrderNumber value on each iteration, I also changed the value of the Revision column to indicate which pass through the insert loop was being executed.



    -- Copy the new big table into itself 9 times
    DECLARE @RevisionNumber nchar(2);
    SELECT @RevisionNumber = RevisionNumber + 1 FROM RevisionNumberValue;
    SELECT @RevisionNumber as RevisionNumber;
    INSERT INTO dbo.FactInternetSalesBig WITH (TABLOCK)
    SELECT ProductKey
    ,OrderDateKey
    ,DueDateKey
    ,ShipDateKey
    ,CustomerKey
    ,PromotionKey
    ,CurrencyKey
    ,SalesTerritoryKey
    ,SalesOrderNumber + @RevisionNumber
    ,SalesOrderLineNumber
    ,@RevisionNumber
    ,OrderQuantity
    ,UnitPrice
    ,ExtendedAmount
    ,UnitPriceDiscountPct
    ,DiscountAmount
    ,ProductStandardCost
    ,TotalProductCost
    ,SalesAmount
    ,TaxAmt
    ,Freight
    ,CarrierTrackingNumber
    ,CustomerPONumber
    ,OrderDate
    ,DueDate
    ,ShipDate
    FROM dbo.FactInternetSalesBig;
    UPDATE RevisionNumberValue SET RevisionNumber = RevisionNumber + 1;
    GO 9

    Make sure you have enough log space. My log grew from 1GB to just over 8GB when running the script to build the 30 million row table and columnstore index, with no clustered index.

    The downloadable script has commented options to build either a clustered or nonclustered index before you build the columnstore index. I suggest building the clustered index before the columnstore index, because building the clustered index will rebuild any existing nonclustered indexes, including my columnstore index. Building the columnstore index can take a while, on my system it was about 10 minutes. (Building the original 30+ million row table took even longer.)

    After I finish the chapter on partitioning, I'll have a script to partition this table, but feel free to do that yourself!

    Have fun!

    ~Kalen

  • Follow-up Answers for my Australia Classes

    I was out of the country for the last two weeks of March, delivering classes in Brisbane and Sydney, which were organized by WardyIT.  It was a great visit and there were 24 terrific students!

    As is sometimes (perhaps often?)  the case, there were questions posed that I couldn’t answer during class, so here are a couple of follow-up answers.

    1. I brought up the fact that SQLS 2012 generates a warning message when there are ‘too many’ Virtual Log Files (VLFs) in a database. (It turns out the message will be generated when a database has more than 10,000 VLFs but the error says that the database has more than 1000 VLFs! Which is of course, technically true, but the message is a little misleading.)  A student reported hearing that the Best Practices Analyzer also had a warning about the number of VLFs, but didn’t know what that limit was, and asked if I knew. I didn’t then, but I do  now. A message will be generated by BPA when there are 1000 or more VLFs. The message is actually focused on the performance problems that can arise when there are two many VLFs. The boot message is more concerned with the impact on database restore, rollback, backup and recovery of too many VLFs.

    2. When discussing locks, I briefly mention the BU lock that is acquired by request (with a hint) during bulk load operations, for the purpose of allowing multiple processes to load into the same table concurrently. Someone asked whether any SSIS components used the BU lock, and I thought it was probably true, but I since I don’t work with SSIS, I couldn’t comment any deeper. I was able to discover, thanks to SQL Server MVPs Paul White and Arthur Zubarev that it definitely is possible for both SQL Destinations and OLE DB Destinations. The destination table must be a heap, and you can specify TABLOCK as an option for the FastLoadOptions property, as shown:

    Kalen 

    3. I very briefly mention Extended Events on the very last day, and mention that in SQL Server 2008 one of the drawbacks is that there is no GUI option to set up an XEvents session, but that has changed in SQL Server 2012. I was asked if the 2012 Management Studio could connect to a SQL Server 2008 instance and set up an XEvents session on that instance. My friend and fellow MVP Ben Miller was able to verify that when connecting from the SQL 2012 GUI to a 2008 instance, the option for creating an XEvents session does not even show up.  You could try scripting a session created for a 2012 instance and running it on a 2008 instance, but there are no guarantees.

    So now I’ll get back to working on my new book …

    ~Kalen

  • Geek City: Growing Rows with Snapshot Isolation

    I just finished a wonderful week in Stockholm, teaching a class for Cornerstone Education. We had 19 SQL Server enthusiasts, all eager to find out everything they could about SQL Server Internals. One questions came up on Thursday that I wasn’t sure of the answer to. I jokingly told the student who asked it to consider it a homework exercise, but then I was so interested in the answer, I try to figure it out myself Thursday evening.  In this post, I’ll tell you what I did to try to answer the question.

    I am writing this on an airplane, flying from Frankfurt to San Francisco.

    When a database is enabled for either ALLOW_SNAPSHOT_ISOLATION or READ_COMMITTED_SNAPSHOT, SQL Server adds 14 bytes of overhead to each row. These 14 bytes contain the XSN (transaction sequence number) at the time the row was modified, and a pointer to the previous ‘version of the row (if there is one) which is stored in a part of the tempdb database called the version store. In class, we were discussing the fact that rows involved in row versioning have the 14 bytes added the first time the row is modified after one of the snapshot isolation levels is enabled.  If your pages are very full, sometimes this can mean that adding the 14 extra bytes will cause one or more of the affected rows to not fit on the original page, so the page may need to be split.

    The student asked if could could avoid the problem of adding the extra 14 bytes on the fly for each row if we rebuilt the table right after enabling one of the snapshot isolation levels. In another words, did rebuilding count as modifying the rows?

    I tested it by creating a copy of one of the small tables in the pubs sample database that you can download from Microsoft, here.  You, of course, could run a similar test of your own with any table.  I first made sure that both snapshot isolation level settings were off. No one can be in the database when you set read READ_COMMITTEE_SNAPSHOT on or off, and it’s best to make sure there are no current transactions when you set ALLOW_SNAPSHOT_ISOLATION on or off.

    ALTER DATABASE pubs SET read_committed_snapshot OFF;
    ALTER DATABASE pubs SET allow_snapshot_isolation OFF;
    GO

    Now make a copy of the authors table.

    USE pubs;
    GO
    IF exists (SELECT * FROM sys.tables WHERE name = 'SIauthors')
                DROP TABLE SIauthors;
    GO
    SELECT * INTO SIauthors from authors;
    GO

    Now I want to look at the new authors table to see what an actual row looks like on the data page. DBCC IND will show me which pages belong to the table.

    DBCC IND (pubs, SIauthors, -1);

    Sometimes I get two rows for data pages (PageType = 1) back, but only one of them has data.

    image

    DBCC PAGE, with traceflag 3604 on, will show me the rows on the data page.

    DBCC TRACEON(3604);
    DBCC PAGE(pubs, 1,245, 1)

    Before enabling snapshot isolation, the rows end with the the bytes for the name of the city. In the output below, for the first row on my page, the city is MENLO PARK. 

    image

    So let’s look at how things change if we enable one of the snapshot isolation levels and rebuild the table.

    -- Make sure no other connections are in the pubs database before running this ALTER. (The connection where you run the command can be in the database.  After changing the database, verify the setting.

    ALTER DATABASE pubs SET read_committed_snapshot ON;

    SELECT is_read_committed_snapshot_on
    FROM sys.databases
    WHERE name = 'pubs';

    Now rebuild the table and find out the new page number(s).

    ALTER TABLE SIauthors REBUILD;
    GO

    DBCC IND (pubs, SIauthors, -1);
    GO

    Because the table has been rebuilt, there will be new page numbers. Use DBCC PAGE to look at the new page, and you’ll see the 14 extra bytes at the end of the rows.

    image

    So I concluded that rebuilding the table would add the extra 14 bytes to each row and that’s what I told the class on Friday. However, as I started preparing to write this post, I decided to do a few more tests, one of which was building a clustered index on the table and then rebuilding the clustered index instead of the table itself. (Note that rebuilding a table is a new feature in SQL Server 2008.) However, you should know that the clustered index includes the table, so we’re still rebuilding the table when we rebuild the clustered index. 

    I found that an ALTER INDEX REBUILD, although it moved the table to new pages, did not add the 14 bytes.  Performing an ALTER TABLE REBUILD, if the table had a clustered index, also did not add the extra bytes. It only seemed to happen when the table is a heap.

    Obviously, more research is required. But I’ll leave that as a homework exercise for you!

    Have fun!

    ~Kalen

  • Did You Know? Query Options Setting in SSMS

    Hi folks

    I’m back from an amazing week at PASS and an awesome reception for SQL Server MVP Deep Dives Volume 2.

    I’m starting to plan and write my next book on SQL Server 2012 Internals so blogging will kept to a minimum, not that I’ve had a lot of time to blog lately.

    However, while working on the final quiz for my Concurrency class through SSWUG, I noticed something peculiar about the Query Options | Advanced dialog in Management Studio.  You have the option of choosing a default ISOLATION LEVEL and a DEADLOCK PRIORITY, and both those values have not been updated since before SQL Server 2005. The ISOLATION LEVEL choices do NOT include SNAPSHOT, which was introduced in SQL Server 2005 and the DEADLOCK_PRIORITY only shows Low and Normal. The value High and the numbers –10 through 10 were added back in SQL Server 2000!

    SNAGHTML1bb0c86

    So my first thought was that nobody had updated this dialog since then, but then I noticed that there was an option you can see under “Results” in the left pane, to give you Multiserver results, and Multiserver queries weren’t added until SQL Server 2008.

    So what do you think? Should Microsoft update this dialog to give you all the possibilities?

    ~Kalen

  • Did You Know? I’ll be teaching my Internals Class online!

    In my most recent commentary for SQL Server Magazine, online here:
    http://www.sqlmag.com/article/training-and-certification2/sql-server-training-options-140429, I talked about different options for making training content available online.

    As an addendum to that, I’d like to mention that one of my training partners (Benchmark Learning in Minneapolis) has decided to offer my complete 5-day deep dives Internals class online, in early November.  We’re doing it as a split week class also, so your brain can have a rest after the first two days.

    You can get the rest of the details by clicking on the link on my schedule page.

    http://schedule.kalendelaney.com

    Although I probably won’t do what Paul did in his series of remotely delivered Australian user group presentations, I probably will at least  be barefoot!

    ~Kalen

  • Geek City: Where are LOBs stored?

    When researching a question from one of the students in my class last week, I was reading the documentation for CREATE TABLE about storing LOB columns at http://msdn.microsoft.com/en-us/library/ms174979.aspx. For this discussion LOB columns includes text, image, ntext, xml and the MAX columns when they are over 8000 bytes and stored outside the regular data row. I knew that SQL Server gives us the capability of storing LOB columns in a separate filegroup with the TEXTIMAGE_ON clause, but I was surprised at one thing the docs mentioned.

    The documentation says:

    TEXTIMAGE_ON { filegroup | "default" }

    Are keywords that indicate that the text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined type columns are stored on the specified filegroup.

    TEXTIMAGE_ON is not allowed if there are no large value columns in the table. TEXTIMAGE_ON cannot be specified if <partition_scheme> is specified. If "default" is specified, or if TEXTIMAGE_ON is not specified at all, the large value columns are stored in the default filegroup. The storage of any large value column data specified in CREATE TABLE cannot be subsequently altered.

    The way I read this highlighted sentence is that if you don’t put the LOB data on a particular filegroup, it will always go on the default filegroup. The default filegroup has a specific meaning for SQL Server. It is the filegroup where all objects are placed if you don’t use the ON clause to put them on a specific filegroup. By default, the default filegroup is the PRIMARY filegroup, but with the ALTER DATABASE / MODIFY FILEGROUP command, you can change the default filegroup.

    I did not think that what this sentence was saying was true, so I decided to test it.

    I first created a new database with three filegroups. The PRIMARY filegroup is the default filegroup, because I am not changing the default to be anything else. The two additional filegroups are FG1 and FG2.

    USE master;

    GO

    IF EXISTS (SELECT * FROM sys.databases WHERE name = 'TestFGs')

    DROP DATABASE TestFGs;

    GO

    CREATE DATABASE TestFGs

    ON PRIMARY

    (NAME = Data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFGPrimary.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%),

    FILEGROUP FG1

    (NAME = FG1_data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\FG1_data.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%),

    FILEGROUP FG2

    (NAME = FG2_data ,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\FG2_data.mdf'

    , SIZE = 10 MB

    , MAXSIZE = 1 GB

    , FILEGROWTH = 25%)

    LOG ON

    (NAME = TestFG_log,

    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFG_log.ldf'

    , SIZE = 20 MB

    , MAXSIZE = 2 GB

    , FILEGROWTH = 25%);

    GO

    I then use the new database, and create three tables, each with the same column definitions. One of the column is a TEXT column which will be stored separately from the regular data rows unless I specify otherwise.

    The first table does not specify any filegroups so everything should go on the default filegroup. The second table specifies a filegroup for the table (FG1) but not for the LOB data. The third table specifies a filegroup for the table (FG1) and separate filegroup for the LOB data (FG2).

    To determine where the data has been stored, I examine the sys.allocation_units view. I need to join it to sys.partitions in order to get the corresponding object_id. The function object_name will give me the name of the table, and the function filegroup_name will translate the column data_space_id into a filegroup name.

    USE TestFGs

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows')

    DROP TABLE bigrows;

    GO

    CREATE TABLE bigrows

    (a char(1000),

    b varchar(8000),

    c text );

    GO

    INSERT INTO bigrows

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    go

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) = 'bigrows';

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows2')

    DROP TABLE bigrows2;

    GO

    CREATE TABLE bigrows2

    (a char(1000),

    b varchar(8000),

    c text )

    ON FG1;

    GO

    INSERT INTO bigrows2

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    go

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) LIKE 'bigrows%';

    GO

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'bigrows3')

    DROP TABLE bigrows3;

    GO

    CREATE TABLE bigrows3

    (a char(1000),

    b varchar(8000),

    c text )

    ON FG1 TEXTIMAGE_ON FG2;

    GO

    INSERT INTO bigrows3

    SELECT REPLICATE('a', 1000), REPLICATE('b', 1000),

    REPLICATE('c', 50);

    GO

    SELECT OBJECT_NAME(object_id) as object_name, FILEGROUP_NAME(data_space_id) as FG_name, type_desc

    FROM sys.partitions p

    JOIN sys.allocation_units a

    on p.partition_id = a.container_id

    WHERE OBJECT_NAME(object_id) like 'bigrows%';

    And here are my results. The first table, bigrows1 has both its regular rows and its LOB data on the default filegroup, PRIMARY. You might also notice that an allocation unit for row-overflow data was created, even though we didn’t have any data that needed that space.

    The second table, bigrows2, has all its data on FG1 even though I just specified to put the table there. The LOB data is stored with the rest of the table, if I don’t specifically place the LOB data on a different filegroup.

    Only for the third table, bigrows3, is the LOB data stored separately, and only because I specifically used the TEXTIMAGE_ON clause when creating the table.

    clip_image001

    So now you know.

    ~Kalen

  • Geek City: What gets logged for SELECT INTO operations?

    Last week, I wrote about logging for index rebuild operations.  I wanted to publish the result of that testing as soon as I could, because that dealt with a specific question I was trying to answer. However, I actually started out my testing by looking at the logging that was done for a different operation, and ended up generating some new questions for myself.

    Before I started testing the index rebuilds, I thought I would just get warmed up by observing the logging for SELECT INTO. I thought I knew what got logged, but I was wrong. My understanding was that as a minimally logged operation, in FULL recovery, SELECT INTO would log each 8K page as it was written, not each row like a normal INSERT would. In BULK_LOGGED, SELECT INTO would create a log record for each page, but it would not log the entire 8K page. And of course, it would also not log every row.  However, my first test turned up some surprising results.

    Here's what I did: First, I made sure the AdventureWorks database was in FULL recovery, and dropped the Details table if it existed. Then I created  a table called SELECTINTOLoggingData to hold the results of calling fn_dblog.  As in last week's script, the following information was captured:

    • Max log record size
    • Sum of log record sizes
    • Number of log records
    • Number of log records larger than 8K

    Finally, I backed up the AdventureWorks database to give me a foundation for log backups. So here's the first part of the script:

    ALTER DATABASE AdventureWorks SET RECOVERY FULL
    GO
    USE AdventureWorks;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables    
                WHERE name = 'Details')
           DROP TABLE Details;
          
    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'SELECTINTOLoggingData')
        DROP TABLE SELECTINTOLoggingData;
    GO

    CREATE TABLE SELECTINTOLoggingData
    (DBRecModel CHAR(12), MaxLogRowSize INT, NumLogRows INT, NumLargeRows INT);
    GO

    BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
    GO

    The actual testing was be done first in FULL recovery, then in BULK_LOGGED. I backed up up the transaction log, then created the Details table by executing a SELECT INTO from the Sales.SalesOrderDetail table.  The Details table has  1495 pages and 121,317 rows. Finally, I inserted the relevant values from fn_dblog  into the SELECTINTOLoggingData table.

    ----  FULL Recovery


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*) FROM fn_dblog(null, null);
    GO
    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    INSERT INTO SELECTINTOLoggingData
    SELECT 'FULL',
            MAX([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
            FROM fn_dblog(null, null);
    GO

    DROP TABLE Details;
    GO


    -- BULK_LOGGED Recovery


    ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    INSERT INTO SELECTINTOLoggingData
    SELECT 'BULK_LOGGED',
            MAX([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
            FROM fn_dblog(null, null);
    GO

    Finally, I looked at the data I collected:

    SELECT * FROM SELECTINTOLoggingData;
    GO

    Here are my results:

    image

    Although the number of large (8K) log rows was as expected, I was quite surprised to see the total number of log records! (And yes, I know the column names use the term "Log Rows", but I'm calling them "Log Records".  I guess I just wanted a shorter column name.)

    I then ran the same test on a different version of the Sales.SalesOrderDetail table that I use in some of my class examples. Here is the script to create the Sales.SalesOrderDetail2 table:

    CREATE TABLE Sales.SalesOrderDetail2 (
        [SalesOrderID] [int] NOT NULL,
        [SalesOrderDetailID] [int] NOT NULL,
        [CarrierTrackingNumber] [nvarchar](25) NULL,
        [OrderQty] [smallint] NOT NULL,
        [ProductID] [int] NOT NULL,
        [SpecialOfferID] [int] NOT NULL,
        [UnitPrice] [money] NOT NULL,
        [UnitPriceDiscount] [money] NOT NULL ,
    CONSTRAINT [PK_SalesOrderDetail2_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED
    (
        [SalesOrderID] ASC,
        [SalesOrderDetailID] ASC
    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY];
    GO

    INSERT INTO Sales.SalesOrderDetail2 SELECT [SalesOrderID]
          ,[SalesOrderDetailID]
          ,[CarrierTrackingNumber]
          ,[OrderQty]
          ,[ProductID]
          ,[SpecialOfferID]
          ,[UnitPrice]
          ,[UnitPriceDiscount]
      FROM AdventureWorks.Sales.SalesOrderDetail;
    GO

    After setting the AdventureWorks database back to FULL recovery, deleting the Details table and truncating the SELECTINTOLoggingData table,  I run the same test as above (starting at the "Full Recovery" comment, substituting Sales.SalesOrderDetail2 for Sales.SalesOrderDetail.

    ALTER DATABASE AdventureWorks SET RECOVERY FULL;
    GO
    DROP TABLE Details;
    GO
    TRUNCATE TABLE SELECTINTOLoggingData
    ;
    GO

    Here are my new results:

    image

    So these are much more like the results I expected from a SELECT INTO operation!

    Why did the first example, selecting from Sales.SalesOrderDetail, do so much more logging?

    It turns out that if you have a IDENTITY column in the source table, SELECT INTO must then log every row as the IDENTITY value is generated.

    Does this mean you should rethink the beneift of using IDENTITY columns? Only if you're doing a lot of SELECT INTO operations to copy a table with IDENTITY columns. Normal INSERT statements, which is how IDENTITY values are usually generated, are going to have the same amount of logging whether or not the table has an IDENTITY column.  (Let me know if you think otherwise.)

    So now you know.

    ~Kalen

    p.s. I would like to thank Sunil Agarwal for his brilliant insights and encouragement. Thanks for everything, Sunil!

  • Geek City: What gets logged for index rebuild operations?

    This blog post was inspired by a question from a future student. Someone who was already booked for my SQL Server Internals class in June asked for some information on a current problem he was having with transaction log writes causing excessive wait times during index rebuild operations when run in ONLINE mode. He wanted to know if switching to BULK_LOGGED recovery could help.

    I knew the difference between ALTER INDEX in FULL vs BULK_LOGGED recovery when doing normal OFFLINE rebuilds, but I wasn't sure about ONLINE. So I dug out some old scripts, massaged them a bit, and ran some tests.

    A minimally logged operation is one that does not always log every single row. In FULL mode, SQL Server will log the entire page as each page is filled, so that we end up with individual  log records of over 8K. In BULK_LOGGED mode, SQL Server just logs the fact that a new page was generated (and some information about the page) and doesn't log the full 8K. So the log space used is quite a bit less.

    I will use Microsoft's AdventureWorks database that you can download from codeplex. I used the SQL Server 2005 version.

    After making sure the database is set to FULL recovery, I make a copy of the Sales.SalesOrderDetail table (which I call Details) and build a clustered index on it.  The table has about 1580 pages and 121,317 rows.


    ALTER DATABASE AdventureWorks SET RECOVERY FULL
    GO
    USE AdventureWorks;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables    
                WHERE name = 'Details')
           DROP TABLE Details;
          
     
    SELECT * INTO Details
    FROM Sales.SalesOrderDetail;
    GO


    CREATE CLUSTERED INDEX CL_details_indx ON Details(SalesOrderID);
    GO

    I then build a table to store log information that I will gather after the different index rebuild operations. After that I back up the database to make sure I have a base for making log backups.

    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'AlterIndexLoggingData')
        DROP TABLE AlterIndexLoggingData;
    GO

    CREATE TABLE AlterIndexLoggingData
    (DBRecModel CHAR(12), IndexBuildMode CHAR(3), MaxLogRowSize INT, SumLogSize BIGINT, NumLogRows INT, NumLargeRows INT);
    GO
    BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
    GO

    Now starts the testing. There are four sections, each of which basically does the same thing, with the four permutations of FULL vs BULK_LOGGED recovery, and ONLINE vs OFFLINE index builds.  Each section backs up the transaction log to truncate it, and then reports the number of rows in the log, just to verify we really have (logically) cleared it out. It then performs a rebuild of the index on the Details table. Finally, it captures the following information from the log using fn_dblog:

    • Max log record size
    • Sum of log record sizes
    • Number of log records
    • Number of log records larger than 8K

    ----  FULL Recovery, ONLINE = OFF

    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*) FROM fn_dblog(null, null);
    GO
    ALTER INDEX  CL_details_indx  ON Details REBUILD
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'FULL', 'OFF',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
     
     
    --- FULL Recovery, ONLINE = ON


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD
    WITH (ONLINE = ON)
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'FULL', 'ON',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    --SELECT * FROM AlterIndexLoggingData;
    GO

    --- BULK_LOGGED Recovery; ONLINE = OFF


    ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'BULK_LOGGED', 'OFF',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    --SELECT * FROM AlterIndexLoggingData;
    GO

    --- BULK_LOGGED Recovery, ONLINE = ON


    BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
    SELECT COUNT(*)  FROM fn_dblog(null, null);
    GO

    ALTER INDEX  CL_details_indx  ON Details REBUILD WITH (ONLINE = ON)
    GO

    INSERT INTO AlterIndexLoggingData
    SELECT 'BULK_LOGGED', 'ON',
            MAX([Log Record Length]),
            SUM([Log Record Length]),
            COUNT(*),
            (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  > 8000)
            FROM fn_dblog(null, null);
    GO
    SELECT * FROM AlterIndexLoggingData;
    GO

    Here are my results:

    image

    Notice the following:

    • Only FULL recovery with an OFFLINE rebuild logged the 8K pages.
    • FULL recovery with an ONLINE rebuild actually did log every single row; in fact, it looks like at least two log records were generated for each row. There is a LOT of logging for this set of options and by far, the largest amount of log space used.
    • BULK_LOGGED mode used substantially less log space than FULL, but the difference between ONLINE and OFFLINE is even more pronounced. In FULL recovery, ONLINE used about twice the log space, in BULK_LOGGED, the logging required more than 8 times the space!

    I'm sure there are more fun facts you can pull out of the results from fn_dblog, but for my first look at the logging differences between ONLINE and OFFLINE index rebuilds, this is what I wanted to know. It's not a complete answer to the student's question, but it does indicate there is a lot more logging with ONLINE index builds, so it should not be surprising if there are a lot more log write waits. And I would assume that switch to BULK_LOGGED recovery could make a noticeable difference.

    I hope this is useful!

    ~Kalen

  • Geek City: Preparing for the SQL Server Master Exam

    I was amazed at the results when I just did a search of SQLBlog, and realized no one had really blogged here about the changes to the Microsoft Certified Master (MCM) program. Greg Low described the MCM program when he decided to pursue the MCM at the end of 2008, but two years later, at the end of 2010, Microsoft completely changed the requirements.

     Microsoft published the new requirements here. The three week intensive course is no longer required, but that doesn't mean you can just buy an exam cram guide. Experience with SQL Server is absolutely required to pass the rigorous knowledge exam and hands-on lab exam. When you see all the recommended preparation, you might wish there was still a way to get all the preparation done in only three weeks. The Microsoft site lists many resources you can use to prepare, including books, web videos and instructor led training. My SQL Server 2008 Internals book is at the top of the book list, and my 5-day SQL Server Internals and Tuning course is listed as a suggested training resource.

    My next delivery is in about 6 weeks, to be taught in Boston, just before the Boston Marathon! The class is organized by SQLBlog's own Adam Machanic and his Boston SQL Training,  and you can get more information and register here.

    See you in class!

    ~Kalen

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement