THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

  • T-SQL Tuesday: What kind of Bookmark are you using?

    TSQL2sDay150x150

    I’m glad there is no minimum length requirement for T-SQL Tuesday blog posts, because this one will be short. I was in the classroom for almost 11 hours today, and I need to be back tomorrow morning at 7:30.

    Way long ago, back in SQL 2000 (or was it earlier?) when a query indicated that SQL Server was going to use a nonclustered index to get row pointers, and then look up those rows in the underlying table, the plan just had a very linear look to it. The operator that indicated going from the nonclustered leaf to the data row was called a ‘Bookmark Lookup’, and it just looked a simple, single operator. Those of us that did troubleshooting of query plans knew that it could hide a multitude of sins, but to many people it looked very innocuous.

    Then in the next version, that simple, single Bookmark Lookup Operator was replaced by something that looked like a JOIN! When I first saw query plans showing a JOIN when doing a nonclustered index lookup, I was almost distraught, but it turns out that was only because it was new and different. The more I thought about it, the more I realized it was a Good Thing.

    Some people might get confused because they think of a JOIN as an operation that finds matches rows between two tables. But in fact, a JOIN can be used to find matches between any two sets of rows. And in the case of a nonclustered index lookup, SQL Server is finding rows in the leaf level of a nonclustered index (the ones that meet your filter condition(s) ) , and then is finding the matching rows in the underlying table. Internally, this is a JOIN operation.

    But when we look at the plans, there are two different operators used to show the actual lookup into the underlying table.

    There is a RID Lookup, used when the table is a heap. The nonclustered index contains Row ID, or RID, values (composed of a File ID, a Page ID and a Slot/Row number on the page). This RID is then used to ‘match’ with rows in the underlying table that have the same RID.

    There is also a KEY Lookup, used when the table has a clustered index.  The leaf level of a nonclustered index contains pointers that are the key values for the rows being pointed to. So to find the matching rows in the table, SQL Server takes the clustered key value from the nonclustered index, and then searches for that value in the clustered index, following the clustered index from the root down to its leaf.

    Below are the two icons used for these operations. If I were to give you a quiz, and ask which operator indicates we are finding a row in a table directly using a RID value, and which indicates we are finding a row in a table using a clustered index key, which would you say is which?  Take a moment to think about it.

    image       image

    I don’t know about you, but when I just try to figure out these two icons, I think the one on the left looks like it should be the KEY Lookup, and the one on the right should be the RID Lookup.

    But if you check the page in Books Online, or if you check your own query plans, you see that they are backwards!

    Here is a plan showing a RID Lookup:

    image

     

    And here is a plan showing a KEY Lookup:

    image

    Fortunately, the graphical query plan tells you what kind of operation it’s performing, so you can just ignore the picture and read the words. But what’s the point of a graphical plan in that case?

    So are they really backwards? Did someone just make a mistake and link in the wrong file when compiling the SQL Server graphical query plan code? Or am I misinterpreting this?

    I still think graphical query plans are one of the best thing ever added to product, so I’m really not complaining, but I’m just sayin….

     

    Have fun, and Happy T-SQL Tuesday!

    ~Kalen

  • Geek City: SQL Server 2014 In-Memory OLTP (“Hekaton”) Whitepaper

    Tech Ed isn’t even over yet, and my new Hekaton Whitepaper is already generating questions I can’t answer yet!

    Here are the two questions that were in my inbox this morning:

    1. The whitepaper says “There are other wait types, such as waiting for a log write to complete.”
    Does this mean that writing to the transaction log  will be the main bottleneck of the system?

    Hey folks, the product isn’t even publicly available. How can I, or anyone, know what your main bottleneck will be? You do know that there is ALWAYS a bottleneck, right? Something has to be the limiting factor? But I can’t even begin to make a guess yet as to what kind of limiting factors will be in Hekaton, now called SQL Server In-Memory OLTP. The paper is about the Internals, and that’s what I researched. I did not do any performance testing. But, as I did mention in the paper, log writing is much more efficient for Hekaton tables than for the disk-based tables. So even if it IS the limiting factor, it will not be as limiting as it could potentially be for non-Hekaton tables.

     

    2.  The whitepaper says “Note that none of the LOB data types are allowed.” 
    We have some 75 columns  which are still ntext, so  it seems we are dead in the water even if we convert to nvarchar(max)?

    As you read in the paper, rows are stored completely differently, and not on the disk-based 8K pages. So there is no mechanism for special pages for LOB or row-overflow data. Also, you’re aware that there is extra overhead for managing and accessing the LOB data, and to make In-Memory OLTP as efficient as possible, those extra-large rows needing extra work are avoided.  And like in my previous answer, without seeing your complete design (no, please don’t send it to me Smile ) there is no way I can tell just how ‘dead in the water’ you might be. I tend to feel there would be a way to redesign your tables, and separate the LOB types (which should be converted to something other than the deprecated ntext soon, anyway) into their own tables. As you read in the paper,  In-Memory OLTP tables can be joined with disk-based tables.

    Please feel free to send more questions, but I most likely will NOT be able to answer every one in a blog post. There will be a update to the paper, with more internals details, for CTP2, and some of the questions will be answered there. Finally, the ultimate plan is a book, like my SQL Server Concurrency book, all about Hekaton… oops, I mean In-Memory OLTP.  Hopefully, the book will contain best practices suggestions gathered from actual research with In-Memory OLTP implementations, plus a big Q&A section to answer questions that weren’t answered elsewhere in the book.

    Thanks!

    ~Kalen

  • Geek City: Document more trace flags?

     

    It’s been over 5 years since I last blogged about trace flags, so it seems a post on the topic is in order.

    If a trace flag is undocumented, it means it is unsupported and not guaranteed. It can go away at any time, even from one service pack to the next. If a trace flag doesn’t work the way the person who told you about said it should work, there is no help for that. It is undocumented.

    Trace flags change SQL Server’s behavior. They are toggles, that can be turned on or off. When they are on, they force SQL Server to do things differently. Sometimes just a little bit differently, sometimes a LOT differently. Some are very harmless, such as 3604 which allows SQL Server to return output from certain undocumented DBCC commands to return output to the client. You need to use this flag if you’re going to use the DBCC PAGE command, which I talk about a lot in my writings and in my SQL Server Internals classes.

    However, there are other trace flags that change SQL Server behavior a LOT. The SQL Server developers at Microsoft create trace flags for lots of reasons, usually to force some sort of non-standard behavior. This might be so the developers can test the feature during development, or so that the support engineers can turn a feature off – either to confirm that the feature was causing a problem. These are usually not changes that you would want to make to your own SQL Server in a normal environment.

    But with so many undocumented trace flags, the question frequently arises “Why aren’t MORE trace flags documented?”

    Here are some of the reasons:

    -- Once a trace flag is official publicly documented, people will start to use it. And people will also start to misuse it. Microsoft needs to consider whether dealing with the misuse will be worse than the problems due to the lack of that particular trace flag.

    -- Before a trace flag is documented, it must be thoroughly tested. FULLY. Across all SKUs, all security environments, all performance conditions. And that is not easy. A trace flag added for one particular situation or one rare bug is usually not tested to the full extent that documented features are. So it will stay undocumented.

    -- Once a trace flag is added, it must be supported, into the future. And can’t be removed without officially deprecating it, at least one full version in advance. Some trace flags are known to have a limited potential usefulness, documenting them will force them to be maintained even when they’re no longer useful.

    -- Because most of the undocumented trace flags are created for a limited set of circumstances, they are not tested in conjunction with other trace flags. In order to be fully tested, each trace flag would have to be tested in combination with every other trace flag. And if a particular combination of trace flags was not pretty, which of the trace flags involved should be left undocumented? It would be really hard to tell, even after spending all the time on all the testing.

    The bar for Microsoft actually documenting a trace flag is therefore quite high. And sometimes if a feature introduced by trace flag is really needed, it will just become part of the product, or added using a different interface, such as a configuration option or database property.

    But Microsoft does document new trace flags. If you look at the history of Books Online to see the number of documented trace flags, you can see the increase:

    --  SQL Server 2005:               10 trace flags

    -- SQL Server 2008:                13 trace flags

    -- SQL Server 2008 R2:           15 trace flags

    -- SQL Server 2012:                19 trace flags

     

    So if you read about an undocumented trace flag, and you think it might be useful for you, be very careful when testing it. And if your careful testing shows it IS useful, let Microsoft know.

    But if your testing shows it isn’t useful, or is actually harmful, you don’t need to let Microsoft know. They probably already know. After all, it’s undocumented.

     

    ~Kalen

  • Geek City: More statistics info available!

     

    I just started using a new DMV (one that’s actually an ‘F’ not a ‘V’, as in Function) that gives us more info about distribution statistics. It returns info about the last statistics update date (which is also available with a function STATS_DATE()). It also provides the number of rows sampled when the statistics were last updated. This is available through DBCC SHOW_STATISTICS, and in a blog post a couple of months ago, I showed you how to get that output into a table for your own querying. One of the coolest things this new DMV shows is the row modification counter, which used to be available in sysindexes as a count of rows changed, but in SQL 2005 and later, SQL Server keeps track of changes to each column that has statistics on it. These values were not visible before, but now they are! This new object also reports any filter definition and includes rows for all statistics, whether index statistics or column statistics.

    As a table valued function, sys.dm_db_stats_properties can be used with the CROSS APPLY operator to give information for all statistics in a database, or you can filter to just return  user objects and not any system objects, as shown in the code here:

    SELECT
       sp.object_id, object_name(sp.object_id) as object_name,sp.stats_id, name as stats_name,
        filter_definition, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter
    FROM sys.stats AS s
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE sp.object_id > 100;

    The new DMV was easy to miss because it wasn’t added until SQL Server 2008R2 Service Pack 2 and SQL Server 2012 Service Pack1. But since you should always be running with the latest service pack, you should have this great DMV if you’re any version after SQL Server 2008.

    Let me know if you find this DMV useful, and what you mainly use it for.

    Have fun!

    ~Kalen

  • Geek City: A Grammar Geek in the Cloud

    Those of you who know me well know that I am usually a stickler for spelling, grammar and proper word usage. I may have even lost a few friends because of what some people see as obsession. I am not infallible, and I do make typos, but I like to try to correct them if possible as soon as I discover them (or as soon as they are brought to my attention.)

    So now I will admit that I made a mistake in my usage of of the word ‘premise’, when talking about Cloud vs. non-Cloud databases. I was using the term on-premise as the opposite of ‘in the Cloud’, and my friend Cindy corrected me last week in no uncertain terms. So I went online and quickly found this post that proved she was right and I was wrong:

    http://fbhalper.wordpress.com/2009/06/11/premise-vs-premises-in-the-cloud/ 

    I did a bit more searching and found that I am not the only one making this mistake. A Research VP at Gartner actually makes the same mistake in a published post:

    http://blogs.gartner.com/chris-wolf/2010/07/14/on-premise-microsoft-azure-an-inevitable-milestone-in-azure%e2%80%99s-evolution/

    I will admit that I find the word ‘on-premises’ awkward to say, so when I’m talking casually I might end up leaving off the last syllable.

    But anytime I’m writing, I promise to use premise vs. premises appropriately.

    And I expect the same from everyone else.

    Smile

    ~Kalen

  • Geek City: Join With Me!

    I remember one of the most surprising changes in SQL Server 2000 2005 was how the graphical plans showed the use of a nonclustered index to seek, and the plan included something that looked like a JOIN to find rows in the base table.  Here’s an example. Although I used SQL Server 2008R2, the graphical plan will be pretty similar to what it looked like back in SQL Server 2000 2005. My code will make a copy of a table in the AdventureWorks2008 database, and then build an index on one of the columns.

    USE AdventureWorks2008;
    GO
    IF object_id('dbo.Sales') IS NOT NULL
        DROP TABLE dbo.Sales;
    GO
    SELECT * INTO dbo.Sales FROM Sales.SalesOrderHeader;
    GO
    CREATE INDEX Sales_SalesPersonID_index on dbo.Sales(SalesPersonID);
    GO

    Now look at the estimated graphical execution plan for this query, that searches for a particular value for the SalesPersonID column:

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 280;
    GO

    You should see something like this:

    image

    It shows a JOIN even though there is only one table involved! What’s with that? It took me a few minutes to figure out what I was seeing, and I must admit that over the years since then I have come to appreciate this graphical representation. You’ll have a problem if you think of a JOIN as only finding matching rows between two tables, but can be solved if you realize that a JOIN can be used when finding matches between any two ROWSETS, i.e. any two sets of rows. There is a set of rows in the nonclustered index on SalesPersonID, and a set of rows in the table. The index seek finds all the index rows with a SalesPersonID value of 280, but those index rows only contain an index key value and a row pointer (RowID or RID). Because the query is requesting all the columns in the table, not just the SalesPersonID value,  SQL Server must find the rows in the dbo.Sales table that have a matching value for the RowID. Internally, it is doing a join, even though there is only one table. It is joining index rows with table rows.

    If the table had a clustered index, the plan would look almost exactly the same, with one difference. In SQL Server, if a table has a clustered index, nonclustered index rows do not contain RowIDs, they contain the clustered key value to be used a pointer to the row. SQL Server can take this clustered key from the nonclustered index row and look up in the clustered index to find the desired row. So I create a clustered index on SalesOrderNumber:

    CREATE UNIQUE CLUSTERED INDEX Sales_ident on dbo.Sales(SalesOrderNumber);
    GO

    Now when I run the same query as above, I get the following plan:

    image

    The only difference in the two plans above is the icon for the lookup into the base table.  One is looking up into a heap using a RowID, which is an address containing the File ID, the page number and the slot or row number on the page. The other icon represents taking a key value from the nonclustered index row and looking it up by seeking through the clustered index. If you look again at these two icons, which one looks like a picture of directly addressing a row, and which looks like a picture of following another index into the table itself?

    image

    The icon on the left is used for a RID lookup, and the icon on the right is used for a clustered index key lookup. I know that nobody asked me, but I think they got these backwards.

    But wait, there’s more…

    Prior to SQL Server 7, there was basically a rule of thumb with query plans that SQL Server would use one index (at most) per table per query. It could never (well, hardly ever) use two indexes on the same table in a single query. That changed in SQL Server 7 when a new algorithm called Index Intersection was introduced.

    I’m going to build another index on SalesOrderDate:

    CREATE INDEX Sales_OrderDate_index on dbo.Sales(OrderDate);
    GO

    This index will be used if I look for all orders place on July 1, 2002. The query plan for the following query is shown:


    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01';

    image

    We saw that the index on SalesPersonID is useful when looking for the value 280, and the index on SalesOrderDate is useful when looking for '2002-07-01'. But what if one query looks for rows that satisfy both conditions?

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01'
    AND SalesPersonID = 280

    Here’s the plan:

    image

    We see two JOINs ! First, SQL Server finds the sets of index rows meeting each of the WHERE filter conditions and those two index row sets are joined together, using the clustered key value as the join column. Then only after SQL Server finds rows that meets both conditions, it takes the common clustered index key and uses it to find the row in the underlying table.

    But wait, there’s more!

    Both of these queries use a table scan, because the filter conditions are not selective enough. There are way more rows with a SalesPersonID value of 289 than there were for a value of 280, and there are more rows with an OrderDate of March 1, 2004 than there are with an OrderDate of July 1, 2002.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01' ;
    GO

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 289;
    GO

    Both of the above queries show the same plan. Because the table now has a clustered index, the plan shows a clustered index scan, but it is exactly the same work as a table scan:

    image

    However, if you look at the plan for a query that uses both filter conditions, we don’t get a clustered index scan.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01'
    AND SalesPersonID = 289;

    image

    Even though each filter alone is not very restrictive, they can both be used for index intersection. The reason SQL Server usually doesn’t choose index seeks on non-restrictive filters is because the cost of then accessing the base table for all the qualifying rows can be very high. But even though each individual index returns a lot of rows, the optimizer’s estimate of the number of rows that meet BOTH conditions is very small, and therefore the number of lookups into the base table is quite small and affordable.

    So, what can you learn? Just because you think an column might not be selective enough to build a nonclustered index on, don’t discount it.  If that column will frequently be used in queries with multiple conditions, it just might turn out to be more useful than you thought!

    So make sure you test many possible choices of indexes and combinations of indexes before you know what the best choices are.

    Have fun!

    ~Kalen

  • Geek City: Accessing Distribution Statistics

    Distribution statistics are one of the most important sources of information that the Query Optimizer uses to determine a good query plan. In this post, I’m not going to tell you everything about distribution statistics. I’m just going to show you a few tricks for getting access to the statistics.

    If you want a deeper understanding of what the statistics keep track of, and you don’t have any of my SQL Server Internals books handy, check out this whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 

    Microsoft does provide us a tool called DBCC SHOW_STATISTICS for examining the distribution statistics.

    Microsoft has gradually been making more of the more of the old DBCC commands available as DMVs, even some undocumented ones. For example, one of my favorites, DBCC IND, has now been replaced in SQL Server 2012  by sys.dm_db_database_page_allocations.

    I have been wishing for several versions that Microsoft would make the DBCC SHOW_STATISTICS information available as a DMV. But it hasn’t happened yet, and I’m tired of waiting, so I decided to do something about it.

    My solution is not quite as easy to use as a DMV might be, but it allows you to get the information that DBCC SHOW_STATISTICS provides into a set of three tables that can then be saved into a more permanent location of your choice, and/or queried as desired.

    DBCC SHOW_STATISTICS returns three sets of information, with different columns in the output, so three different tables are needed. The DBCC SHOW_STATISTICS command can be called with an argument that specifies that you just want one of the three sets returned. The options are

    WITH STAT_HEADER – returns basic info such as last update date, and number of rows in the table/index. Also reports number of steps returned for HISTOGRAM section.

    WITH DENSITY_VECTOR – returns density info for each left-based subset of columns in the index. For example, an index on (lastname, firstname, city) would have a density value for (lastname), for (lastname, firstname), and for (lastname, firstname, city). Each density value is a single number representing the average number of occurrences and depends on the number of distinct values. For example, if there are only 2 possible values in the column, the density would be 0.5. Multiplying density by the number of rows in the STAT_HEADER section would give the average expected rowcount if a query was executed looking for an equality on the specified column(s).

    WITH HISTOGRAM – returns a set of ordered values from the first column of the index, creating a histogram. This histogram provides the optimizer with selectivity information for specific values or ranges of values in the first column of the index.

    To collect this info, I will use one of my favorite tricks, which is to create a table in the master database with a name starting with sp_. (I’ve written about this trick several times, including in this earlier blog post.) Once I have the table(s) created, I can access them from any database. So here are the three tables:

    USE Master;
    GO

    IF  (SELECT object_id('sp_stat_header')) IS NOT NULL
      DROP TABLE sp_statsheader;
    GO
    CREATE TABLE sp_stat_header
    (   Name sysname,
        Updated datetime,
        Rows bigint,
        Rows_sampled bigint,
        Steps smallint,
        Density numeric (10,9),
        Average_key_length smallint,
        String_index char(3),
        Filter_expression nvarchar(1000),
        Unfiltered_rows bigint);
    GO


    IF  (SELECT object_id('sp_density_vector')) IS NOT NULL
      DROP TABLE sp_density_vector;
    GO
    CREATE TABLE sp_density_vector
    (  all_density numeric(10,8),
       average_length smallint,
       columns nvarchar(2126) );
    GO

    IF  (SELECT object_id('sp_histogram')) IS NOT NULL
      DROP TABLE sp_histogram;
    GO
    CREATE TABLE sp_histogram
    (   RANGE_HI_KEY sql_variant,
        RANGE_ROWS bigint,
        EQ_ROWS bigint,
        DISTINCT_RANGE_ROWS bigint,
        AVG_RANGE_ROWS bigint);
    GO

    The second trick is to use INSERT … EXEC to execute a DBCC statement and populate the tables. I will build the DBCC  command dynamically, after capturing the schema, table and index names in variables. You of course could take this code and turn it into a stored procedure, for which the schema, table and index names are passed as parameters. I’ll use as an example a table in the AdventureWorks2008 sample database, just so you can try running the code, and I can verify that it actually works!

    I will use the table Sales.SalesOrderDetail and the index IX_SalesOrderDetail_ProductID. So the object name (@oname) is SalesOrderDetail, the schema name (@sname) is Sales, and the index name (@iname) is IX_SalesOrderDetail_ProductID.

    SET NOCOUNT ON;
    USE AdventureWorks2008;
    GO
    DECLARE @oname sysname,  @iname sysname, @sname sysname

    SELECT @oname = 'SalesOrderDetail',  @sname = 'Sales', @iname = 'IX_SalesOrderDetail_ProductID';
     
    -- Update the object name to include the schema name, because that is the format the DBCC command expects
    SELECT @oname = @sname +'.' + @oname;

    TRUNCATE TABLE sp_stat_header;
    INSERT INTO sp_stat_header
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH STAT_HEADER');

    TRUNCATE TABLE sp_density_vector;
    INSERT INTO sp_density_vector
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH DENSITY_VECTOR');

    TRUNCATE TABLE sp_histogram;
    INSERT INTO sp_histogram
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH HISTOGRAM');

    So now you can look at the values collected and filter or query in any way, or use SELECT INTO to save them into another table, so the sp_ tables can be used the next time you want to capture distribution statistics information.

    SELECT * FROM sp_stat_header;

    SELECT * FROM sp_density_vector;

    SELECT * FROM sp_histogram;

      

    Let me know if you find this useful, and especially if you embellish it to create a procedure or an automated process of your own!

    Thanks!

    ~Kalen

  • Geek City: What Triggered This Post?

    I’d really like to get another post up onto my much neglected blog before the end of 2012. This will also start one of my New Year’s resolutions, which is to write at least one blog post a month. I’m going to tell you about a change in SQL Server that wasn’t announced in any “What’s New” list that I ever saw, perhaps because it was just a chance in internal behavior, and nothing that required any change in user applications.

    Do you retest what you know is true for every new version? When I update my books, I do test all the scripts, but if there isn’t a script, I don’t retest every ‘fact’ that I have known for years is true. And sometimes, things change. And sometimes my reviewers notice those unreported changes, and sometimes they don’t.

    You might be aware of the fact that SQL Server can perform UPDATE operations in two different ways. The UPDATE can be performed as a two-step process: delete the old row and then insert a whole new row, or, the UPDATE can be performed (much more efficiently) as an update-in-place.  When the two-step UPDATE is performed, it is a LOT more work. Not only does SQL Server have to log the entire old row and the entire new row, but each nonclustered index is also modified twice, and each of those index changes also has to be logged. So it’s nice when an update-in-place is done, because only the bytes changed are logged, and only indexes on the updated columns are affected.

    Prior to SQL Server 7, there were actually four different ways that UPDATE could be done. The two-step UPDATE had some variations that could make it even slower in some cases! But that was a long time ago, so I’m not going to go into the details now. But I will say that back then, in order to get an update-in-place to occur, there was a big long list of prerequisites that had to be met and if you missed just one, you’d get one of the slower UPDATE operations.

    As of SQL Server 7, update-in-place became the default. The only time it doesn’t happen is when the row can’t stay in the same location (such as when you update a clustered index key column) or when SQL Server really needs the old and new versions of the row.

    In SQL 7, one of the places that SQL needed the old and new version of the updates rows was when processing triggers. Triggers need the transaction log to get the contents for the DELETED and INSERTED pseudo-tables. And because triggers needed the entire old and new versions of the updated rows, the UPDATE was performed as a two-step operation. DELETE the old row, log the entire old row, and the INSERT the new row with the new values, and log the entire new row.

    But as of 2005, we now have the version store, primarily used for SNAPSHOT isolation, but available for other uses as well. In SNAPSHOT isolation, the version stores stores ‘old versions’ of rows that have been updated or deleted.  I knew that the version store was also used for triggers, but it only occurred to me just recently that maybe, because the old and new versions of the row were not needed from the log, perhaps UPDATEs did not always need to be performed internally as a two-step UPDATE.

    So I decided to test it out.

    -- DEMO: If there is an UPDATE trigger, are updates logged as DELETE + INSERT?
    -- First build a new database.

    USE master;
    GO
    IF (SELECT db_id('TestTrigger')) IS NOT NULL
        DROP DATABASE TestTrigger;
    GO
    CREATE DATABASE TestTrigger;
    GO
    ALTER DATABASE TestTrigger SET RECOVERY SIMPLE;
    GO
    SELECT db_id('TestTrigger');
    GO

    USE TestTrigger;
    GO

    -- Just for a warmup, look at the function fn_dblog, which works in the current database

    SELECT * FROM fn_dblog(null, null);
    GO

    -- Create a new table to work with
    IF (SELECT object_id('objects')) IS NOT NULL
        DROP TABLE objects;
    GO
    SELECT TOP 100 * INTO objects FROM sys.objects;
    GO

    -- Create a clustered index on the table
    CREATE CLUSTERED INDEX objects_clustered on objects(name);
    GO

    -- First examine an update we know is NOT done in place,
    -- i.e. updating a clustered key value

    UPDATE objects SET name = 'newrowsets' WHERE name = 'sysrowsets';
    GO

    -- Look at last 10 rows; notice a LOP_DELETE_ROWS and LOP_INSERT_ROWS
    -- The AllocUniteName column shows the object affected is the clustered index on dbo.objects
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);
    GO

    -- Now examine an update we know is  done in place,
    -- i.e. updating an unindexed column on a table with no triggers
    UPDATE objects SET parent_object_id = 1 WHERE name = 'sysfiles1';
    GO

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW on the dbo.objects allocation unit
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);
    GO

    -- Create an update trigger
    -- Will the update be done with the siple LOP_MODIFY_ROW or with the LOP_DELETE_ROWS and LOP_INSERT_ROWS
    CREATE TRIGGER trg_update_objects ON objects FOR UPDATE
    as
    SELECT * FROM DELETED; SELECT * FROM INSERTED;
    RETURN;
    GO

    -- Now perform update again
    UPDATE objects SET parent_object_id = 10 WHERE name = 'sysfiles1';
    GO

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW
    SELECT * FROM fn_dblog(null, null);
    GO

    Since the database is in SIMPLE recovery model, you can issue a CHECKPOINT before each UPDATE if you want to reduce the number of rows in the log to make it easier to examine.

    So it seems that I need to update my course and some of my writings. There might also be special cases that still require that an two-step UPDATE be performed in the presence of triggers, but it seems like a two-step UPDATE is not ALWAYS required anymore. That is very good news!

    I hope you all have a wonder-filled and joyous New Year!

    ~Kalen

  • Did You Know? I’m delivering my first SQL Server 2012 class this week!

    This  is actually just a very short post to get my name back up in the list on the right side of SQLBlog, because Adam’s configuration removes people who haven’t posted in 3 months. I’ve been extremely busy trying to get my  new book finished, and then get my course updated for its first delivery this week. I’ll be teaching it again in Norway at the end of the month. 

    It’s a great class this week, and a couple of questions have inspired longer posts, which hopefully I be able to get to next week.

    Hopefully I’ll also have an update on the timeline for the new book by next week.

    My schedule is always available if you are interested in learning SQL Server 2012 Internals with me!

    http://schedule.KalenDelaney.com

    Thanks

    ~Kalen

  • Did You Know? Turning Off Locking

    Back when the lock manager was rewritten for SQL Server 7, and row level locking was added to the product, the procedure sp_indexoption gave us the ability to turn OFF either row or page locks or both, for one particular index, or for a table and all the indexes on it. I must admit, I rarely found this option to be the solution to any sticky problems, although while doing some quick research prior to writing this post, I did find that Microsoft recommended turning off locking options on a couple of tables  when working with SAP:  http://msdn.microsoft.com/en-us/library/aa226172(v=sql.70).aspx

    In my experience, it is MUCH more common to have people turn off row or page locks when they shouldn’t, and by inspecting the sys.indexes catalog view, you can tell if that has happened.  (Prior to SQL Server 2005,  you could use the INDEXPROPERTY function.) 

    I recently got email from a reader of my books who pointed out to me an error I had made in my SQL Server 2005 book, where I had claimed that sp_indexoption could not be used to disallow row or page locks on heaps. If you check the BOL, it clearly says that you can supply either an index name or a table name to sp_indexoption, so that was just an oversight in my writing. 

    But then along comes ALTER INDEX, and the BOL now says you should avoid using sp_indexoption, as it will be removed in a future release, and you should use ALTER INDEX instead. So the reader sent me a follow-up question, asking how can we disallow row or page locking on a heap using ALTER INDEX. 

    It turns out that we can specify the ALL option instead of an index name when using ALTER INDEX, and this has the same effect as using sp_indexoption on a heap.  You can see it in this short example.

    First create a heap with a nonclustered index:

    use tempdb;
    GO
    IF object_id('testlocks') IS NOT NULL
       DROP TABLE testlocks;
    GO

    CREATE TABLE testlocks
    ( a int);
    CREATE INDEX testlocks_index ON testlocks(a);
    GO

    Now look at the lock properties:

    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    Results:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- ---------------
    1               1                HEAP
    1               1                NONCLUSTERED

    As shown, the default is that both row and page locks are allowed.

    Now use the old sp_indexoption to disallow row locks and then check the lock properties again:

    EXEC sp_indexoption testlocks, disallowrowlocks, 1;
    GO
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    Results:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    0               1                HEAP
    0               1                NONCLUSTERED

    Notice that sp_indexoption turned off row locking for both the heap and the nonclustered index.

    Now use ALTER INDEX to allow row locks and inspect again:

    ALTER INDEX ALL ON testlocks
    SET  (ALLOW_ROW_LOCKS = ON );
    GO
    SELECT allow_row_locks, allow_page_locks, type_desc FROM sys.indexes
    WHERE object_id = object_id('testlocks');
    GO

    RESULTS:

    allow_row_locks allow_page_locks type_desc
    --------------- ---------------- -------------
    1               1                HEAP
    1               1                NONCLUSTERED

    Keep in mind that I’m not recommending that you turn off either row or page locking. And if you decide to change the types of locks allowed, make sure you test your application thoroughly under your maximum expected load to make sure you haven’t made things worse. And remember that you can’t turn off TABLE locking, only row and page locking.

    Have fun!

    ~Kalen

  • Geek City: Build a Big Table with a Columnstore Index

    I was looking all over to find a big table to use for my columnstore examples in my SQL Server 2012 book, and the only one I found was from someone at Microsoft and was not publicly available. When I demonstrate code in my writings, I like to have the code available for anyone to use, so that was no use.

    Finally I realized I was just going to have to do it myself. I actually based the script on some work by the awesome Benjamin Nevarez (blog| twitter), but I needed to make a couple of changes to the table structure, and then the loop for adding the data took some tweeking to make sure that uniqueness was possible, in case you need to test out how the metadata changes when a columnstore index is built on a table with a unique clustered index compared to a nonunique clustered index compared to a heap.

    I have just finished the chapter on indexes, and decided to make this script available. The initial table data is based on the data in Microsoft AdventureWorksDW2012 sample database that you can download here. (Note that the DW version is the first one under “Other Available Downloads”’; it’s not the one under “Recommended Download”. )

    Here is the section of the script that populates most of the FactInternetSalesBIG table. (The original 60K rows were just copied from the original FactInternetSales table. Note that all the statements have to be run in a single batch because the local variable’s scope is the batch, and because the GO 9 applies to the single batch that precedes it. Of course, you can change the number of iterations to end up with a different size table. Mine ends up at just over 30 million rows. In addition to modifying the SalesOrderNumber value on each iteration, I also changed the value of the Revision column to indicate which pass through the insert loop was being executed.



    -- Copy the new big table into itself 9 times
    DECLARE @RevisionNumber nchar(2);
    SELECT @RevisionNumber = RevisionNumber + 1 FROM RevisionNumberValue;
    SELECT @RevisionNumber as RevisionNumber;
    INSERT INTO dbo.FactInternetSalesBig WITH (TABLOCK)
    SELECT ProductKey
    ,OrderDateKey
    ,DueDateKey
    ,ShipDateKey
    ,CustomerKey
    ,PromotionKey
    ,CurrencyKey
    ,SalesTerritoryKey
    ,SalesOrderNumber + @RevisionNumber
    ,SalesOrderLineNumber
    ,@RevisionNumber
    ,OrderQuantity
    ,UnitPrice
    ,ExtendedAmount
    ,UnitPriceDiscountPct
    ,DiscountAmount
    ,ProductStandardCost
    ,TotalProductCost
    ,SalesAmount
    ,TaxAmt
    ,Freight
    ,CarrierTrackingNumber
    ,CustomerPONumber
    ,OrderDate
    ,DueDate
    ,ShipDate
    FROM dbo.FactInternetSalesBig;
    UPDATE RevisionNumberValue SET RevisionNumber = RevisionNumber + 1;
    GO 9

    Make sure you have enough log space. My log grew from 1GB to just over 8GB when running the script to build the 30 million row table and columnstore index, with no clustered index.

    The downloadable script has commented options to build either a clustered or nonclustered index before you build the columnstore index. I suggest building the clustered index before the columnstore index, because building the clustered index will rebuild any existing nonclustered indexes, including my columnstore index. Building the columnstore index can take a while, on my system it was about 10 minutes. (Building the original 30+ million row table took even longer.)

    After I finish the chapter on partitioning, I'll have a script to partition this table, but feel free to do that yourself!

    Have fun!

    ~Kalen

  • Follow-up Answers for my Australia Classes

    I was out of the country for the last two weeks of March, delivering classes in Brisbane and Sydney, which were organized by WardyIT.  It was a great visit and there were 24 terrific students!

    As is sometimes (perhaps often?)  the case, there were questions posed that I couldn’t answer during class, so here are a couple of follow-up answers.

    1. I brought up the fact that SQLS 2012 generates a warning message when there are ‘too many’ Virtual Log Files (VLFs) in a database. (It turns out the message will be generated when a database has more than 10,000 VLFs but the error says that the database has more than 1000 VLFs! Which is of course, technically true, but the message is a little misleading.)  A student reported hearing that the Best Practices Analyzer also had a warning about the number of VLFs, but didn’t know what that limit was, and asked if I knew. I didn’t then, but I do  now. A message will be generated by BPA when there are 1000 or more VLFs. The message is actually focused on the performance problems that can arise when there are two many VLFs. The boot message is more concerned with the impact on database restore, rollback, backup and recovery of too many VLFs.

    2. When discussing locks, I briefly mention the BU lock that is acquired by request (with a hint) during bulk load operations, for the purpose of allowing multiple processes to load into the same table concurrently. Someone asked whether any SSIS components used the BU lock, and I thought it was probably true, but I since I don’t work with SSIS, I couldn’t comment any deeper. I was able to discover, thanks to SQL Server MVPs Paul White and Arthur Zubarev that it definitely is possible for both SQL Destinations and OLE DB Destinations. The destination table must be a heap, and you can specify TABLOCK as an option for the FastLoadOptions property, as shown:

    Kalen 

    3. I very briefly mention Extended Events on the very last day, and mention that in SQL Server 2008 one of the drawbacks is that there is no GUI option to set up an XEvents session, but that has changed in SQL Server 2012. I was asked if the 2012 Management Studio could connect to a SQL Server 2008 instance and set up an XEvents session on that instance. My friend and fellow MVP Ben Miller was able to verify that when connecting from the SQL 2012 GUI to a 2008 instance, the option for creating an XEvents session does not even show up.  You could try scripting a session created for a 2012 instance and running it on a 2008 instance, but there are no guarantees.

    So now I’ll get back to working on my new book …

    ~Kalen

  • Geek City: Growing Rows with Snapshot Isolation

    I just finished a wonderful week in Stockholm, teaching a class for Cornerstone Education. We had 19 SQL Server enthusiasts, all eager to find out everything they could about SQL Server Internals. One questions came up on Thursday that I wasn’t sure of the answer to. I jokingly told the student who asked it to consider it a homework exercise, but then I was so interested in the answer, I try to figure it out myself Thursday evening.  In this post, I’ll tell you what I did to try to answer the question.

    I am writing this on an airplane, flying from Frankfurt to San Francisco.

    When a database is enabled for either ALLOW_SNAPSHOT_ISOLATION or READ_COMMITTED_SNAPSHOT, SQL Server adds 14 bytes of overhead to each row. These 14 bytes contain the XSN (transaction sequence number) at the time the row was modified, and a pointer to the previous ‘version of the row (if there is one) which is stored in a part of the tempdb database called the version store. In class, we were discussing the fact that rows involved in row versioning have the 14 bytes added the first time the row is modified after one of the snapshot isolation levels is enabled.  If your pages are very full, sometimes this can mean that adding the 14 extra bytes will cause one or more of the affected rows to not fit on the original page, so the page may need to be split.

    The student asked if could could avoid the problem of adding the extra 14 bytes on the fly for each row if we rebuilt the table right after enabling one of the snapshot isolation levels. In another words, did rebuilding count as modifying the rows?

    I tested it by creating a copy of one of the small tables in the pubs sample database that you can download from Microsoft, here.  You, of course, could run a similar test of your own with any table.  I first made sure that both snapshot isolation level settings were off. No one can be in the database when you set read READ_COMMITTEE_SNAPSHOT on or off, and it’s best to make sure there are no current transactions when you set ALLOW_SNAPSHOT_ISOLATION on or off.

    ALTER DATABASE pubs SET read_committed_snapshot OFF;
    ALTER DATABASE pubs SET allow_snapshot_isolation OFF;
    GO

    Now make a copy of the authors table.

    USE pubs;
    GO
    IF exists (SELECT * FROM sys.tables WHERE name = 'SIauthors')
                DROP TABLE SIauthors;
    GO
    SELECT * INTO SIauthors from authors;
    GO

    Now I want to look at the new authors table to see what an actual row looks like on the data page. DBCC IND will show me which pages belong to the table.

    DBCC IND (pubs, SIauthors, -1);

    Sometimes I get two rows for data pages (PageType = 1) back, but only one of them has data.

    image

    DBCC PAGE, with traceflag 3604 on, will show me the rows on the data page.

    DBCC TRACEON(3604);
    DBCC PAGE(pubs, 1,245, 1)

    Before enabling snapshot isolation, the rows end with the the bytes for the name of the city. In the output below, for the first row on my page, the city is MENLO PARK. 

    image

    So let’s look at how things change if we enable one of the snapshot isolation levels and rebuild the table.

    -- Make sure no other connections are in the pubs database before running this ALTER. (The connection where you run the command can be in the database.  After changing the database, verify the setting.

    ALTER DATABASE pubs SET read_committed_snapshot ON;

    SELECT is_read_committed_snapshot_on
    FROM sys.databases
    WHERE name = 'pubs';

    Now rebuild the table and find out the new page number(s).

    ALTER TABLE SIauthors REBUILD;
    GO

    DBCC IND (pubs, SIauthors, -1);
    GO

    Because the table has been rebuilt, there will be new page numbers. Use DBCC PAGE to look at the new page, and you’ll see the 14 extra bytes at the end of the rows.

    image

    So I concluded that rebuilding the table would add the extra 14 bytes to each row and that’s what I told the class on Friday. However, as I started preparing to write this post, I decided to do a few more tests, one of which was building a clustered index on the table and then rebuilding the clustered index instead of the table itself. (Note that rebuilding a table is a new feature in SQL Server 2008.) However, you should know that the clustered index includes the table, so we’re still rebuilding the table when we rebuild the clustered index. 

    I found that an ALTER INDEX REBUILD, although it moved the table to new pages, did not add the 14 bytes.  Performing an ALTER TABLE REBUILD, if the table had a clustered index, also did not add the extra bytes. It only seemed to happen when the table is a heap.

    Obviously, more research is required. But I’ll leave that as a homework exercise for you!

    Have fun!

    ~Kalen

  • Did You Know? Query Options Setting in SSMS

    Hi folks

    I’m back from an amazing week at PASS and an awesome reception for SQL Server MVP Deep Dives Volume 2.

    I’m starting to plan and write my next book on SQL Server 2012 Internals so blogging will kept to a minimum, not that I’ve had a lot of time to blog lately.

    However, while working on the final quiz for my Concurrency class through SSWUG, I noticed something peculiar about the Query Options | Advanced dialog in Management Studio.  You have the option of choosing a default ISOLATION LEVEL and a DEADLOCK PRIORITY, and both those values have not been updated since before SQL Server 2005. The ISOLATION LEVEL choices do NOT include SNAPSHOT, which was introduced in SQL Server 2005 and the DEADLOCK_PRIORITY only shows Low and Normal. The value High and the numbers –10 through 10 were added back in SQL Server 2000!

    SNAGHTML1bb0c86

    So my first thought was that nobody had updated this dialog since then, but then I noticed that there was an option you can see under “Results” in the left pane, to give you Multiserver results, and Multiserver queries weren’t added until SQL Server 2008.

    So what do you think? Should Microsoft update this dialog to give you all the possibilities?

    ~Kalen

  • Did You Know? I’ll be teaching my Internals Class online!

    In my most recent commentary for SQL Server Magazine, online here:
    http://www.sqlmag.com/article/training-and-certification2/sql-server-training-options-140429, I talked about different options for making training content available online.

    As an addendum to that, I’d like to mention that one of my training partners (Benchmark Learning in Minneapolis) has decided to offer my complete 5-day deep dives Internals class online, in early November.  We’re doing it as a split week class also, so your brain can have a rest after the first two days.

    You can get the rest of the details by clicking on the link on my schedule page.

    http://schedule.kalendelaney.com

    Although I probably won’t do what Paul did in his series of remotely delivered Australian user group presentations, I probably will at least  be barefoot!

    ~Kalen

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement