THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

  • Geek City: A row with no fixed length columns

    Today I will return to my roots and post about the bits and bytes of internal storage. I received a request a while back to help with deciphering the DBCC PAGE output for a row with no fixed length columns. The person had read the section in my book (SQL Server 2008 Internals) on row storage, but was still having trouble. He presented this table with one row:

     

    USE testdb;

    CREATE TABLE varchk(name varchar(10))
    INSERT INTO varchk VALUES ('kalen')

    He used DBCC IND to find the page number:

    DBCC IND (testdb, varchk, 1)

    and then then took the file and page number returned to pass to DBCC PAGE:

    DBCC TRACEON(3604)
    DBCC PAGE (testdb, 1, 188, 1)

    He saw these bytes for the row:

    30000400 01000001 0010006b 616c656e †0..........kalen

    These were his questions:

    How to find the variable column offset?
    How to find the end fixed length column and start of Null bitmap?
    How to find the offset of each variable-length column?

     

    And here was my reply:

    Here is your row:

    30000400 01000001 0010006b 616c656e, it is exactly 16 bytes long.

    3000 = Status Bits

    0400 = 2 byte integer for length of fixed length portion, reverse bytes because it is a single integer = 0004 = 4 bytes for everything up to last fixed length column

    0100  = 2 byte integer for number of columns, reverse bytes because it is a single integer = 0001 = 1 byte

    00 (shaded) = Null bitmap

    0100 = 2 byte integer for number of variable length columns, reverse bytes because it is a single integer = 0001 = 1 column

    1000 (shaded) = 2 byte integer for position where 1st variable length column ends, reverse bytes because it is a single integer = 0010 = 16, which is last column where variable length column ends

    6b616c656e = variable length data

    6b = k

    61 = a

    6c = l

    65 = e

    6e = n

    Your question said you were looking for the offset of the variable length column. Please reread the section (in the book) that describes the storage of rows with variable length columns.  We are storing the ending POSITION, not the offset, and I say this:

    A data row that has any variable-length columns has a column offset array in the data row with a 2-byte entry for each non-NULL variable-length column, indicating the position within the row where the column ends. (The terms offset and position aren’t exactly interchangeable. Offset is 0-based, and position is 1-based. A byte at an offset of 7 is in the eighth byte position in the row.)

    I know that’s pretty geeky, but for anyone else struggling with understanding row storage, maybe all you needed was just one more example!

     

    Have fun!

    ~Kalen

  • Geek City: Did You Know … that not everything you know is actually true!

    Software changes, new versions not only add new features, they also change internal behavior of old features,  and not all the changes are documented!  And how often, after a software upgrade, do you go through and test everything you knew was true, to make sure it is STILL true after the upgrade? I write books about the SQL Server software, and although I do verify all the code in my books against the new version, I admit that I don’t always check every single fact mentioned in the entire book to make sure it is still true.

    There are a number of such changes that I’ve discovered over the last year or so, that I’ll be telling you about in a couple of blog posts. Plus, there are things that I read on other peoples’ blogs, where the author states something is true that I know is no longer true. Here’s an example:

    It used to be true, a very long time ago, that you could not rollback a TRUNCATE TABLE operation. That changed so long ago, I can’t even remember what version it was in. But in all current versions, you can rollback a TRUNCATE TABLE, and it very easy to prove that. All you have to do is question whether or not the fact that “you cannot roll back a TRUNCATE TABLE” is actually true. But if you think it’s true, and never question it, you’ll never know.

    Here’s the example I use in my classes to show that you CAN rollback a TRUNCATE TABLE:

    USE testdb – or any test database of your choice
    GO

    IF object_id('smallrows') IS NOT NULL
        DROP TABLE smallrows;
    GO

    CREATE TABLE smallrows
    (
        a int IDENTITY PRIMARY KEY,
        b char(10)
    )
    GO

    INSERT INTO smallrows VALUES
            ('row 1'),
            ('row 2'),
            ('row 3'),
            ('row 4'),
            ('row 5');
    GO
    SELECT * FROM smallrows;
    GO


    -- TRUNCATE TABLE can be rolled back
    BEGIN TRAN
    GO
    TRUNCATE TABLE smallrows;
    GO
    SELECT * FROM smallrows;
    GO
    ROLLBACK TRAN;
    GO
    SELECT * FROM smallrows;
    GO

    In my classes, right after I talk about DELETE and TRUNCATE operations, I talk about UPDATE. You might be aware that UPDATE can be performed in a couple of different ways, SQL Server can do something called an “update-in-place” where it just changes the old bytes to the new bytes, and logs one simple update operation. Or, the UPDATE can be performed as two separate operations: DELETE the entire old row (and update all the indexes and log the DELETE plus all the index changes) and then INSERT an entire new row (and update all the indexes and log the INSERT plus all the index changes.)  Obviously, update-in-place is preferred, but there are some conditions that must be met in order for SQL Server to perform an update-in-place. And one of the conditions I had on my list for years turned out not to be true any more when I tested it several months ago.

    The main reason that SQL Server might not do an update-in-place is because you are updating the clustered key value. And since the clustered key value determines where the row goes in the table, changing that value will change the row location. It cannot be done in-place (even when the row doesn’t actually have to move, as you’ll see in the example.) Another reason for not doing an update-in-place used to be because the table had an UPDATE TRIGGER. Prior to SQL Server 2005, the contents of the special ‘inserted’ and ‘deleted’ tables that were available inside a trigger were generated from the transaction log. So in order for these rows to be available, the entire new row and old row had to be logged. But in SQL Server 2005, with the introduction of the version store, mainly used to support row version for snapshot isolation, SQL Server started using the version store technology to get the old and new versions of updated rows if there was a trigger that needed them.

    So did having an update trigger still preclude update-in-place?

    I decided to test it.

    This first block creates a table with no clustered index and no trigger.


    -- no clustered index, no trigger
    IF object_id('test1') IS NOT NULL DROP TABLE test1;
    GO
    CREATE TABLE test1
    (a int,
      b char(200) );
    GO
    INSERT INTO test1 values (1, replicate('a',200));
    GO
    SELECT * FROM fn_dblog(null, null)
    WHERE AllocUnitName like '%test1%';
    GO

    You should see 5 rows in the log, with the last one being an operation LOP_INSERT_ROWS, which indicated the single row inserted.

    Now update the row:


    UPDATE test1 SET a=2;
    GO
    SELECT [Current LSN], Operation, Context, [Log Record Length] as Length, AllocUnitName
    FROM fn_dblog(null, null)
    WHERE AllocUnitName like '%test1%';
    GO

    You’ll see something like this:

    image

    There is one additional row, with operation LOP_MODIFY_ROW, for the update-in-place. Notice the length of that log record is 100 bytes.

    In the INSERT log record, the length is 304 bytes.

    Now run the same test on a table with a clustered index on column a.


    --  clustered index, no trigger
    IF object_id('test2') IS NOT NULL DROP TABLE test2;
    GO
    CREATE TABLE test2
    (a int primary key clustered,
      b char(200) );
    GO
    INSERT INTO test2 values (1, replicate('a',200));
    GO
    SELECT * FROM fn_dblog(null, null)
    WHERE AllocUnitName like '%test2%';
    GO
    UPDATE test2 SET a=2;
    GO
    SELECT [Current LSN], Operation, Context, [Log Record Length] as Length, AllocUnitName
    FROM fn_dblog(null, null)
    WHERE AllocUnitName like '%test2%';
    GO

    This time, after the UPDATE,  you’ll see these log records:

    image

    Notice there is no LOP_MODIFY_ROW for the clustered index. There is a LOP_DELETE_ROWS and LOP_INSERT_ROW and their lengths are much longer than the length of the LOP_MODIFY_ROW. The entire old row and entire new row are being logged. This may not seem like a big deal, but imagine instead of one row, that you are updating thousands or even millions of rows. Also note that there is just the single row in the table, so the row doesn’t have to move when updated. There’s only one place for it to be! But SQL Server only knows that the UPDATE is changing a clustered key value, so it is performed as the two step operation.

    Finally, I’ll do the test one more time, on a table with no clustered index, but with an UDPATE trigger.


    -- no clustered index, update trigger
    IF object_id('test3') IS NOT NULL DROP TABLE test3;
    GO
    CREATE TABLE test3
    (a int,
      b char(200) );
    GO
    CREATE TRIGGER upd_test3 on test3 for UPDATE
    AS
      SELECT * FROM inserted;
      SELECT * FROM deleted;
    RETURN
    GO
    INSERT INTO test3 values (1, replicate('a',200));
    GO
    SELECT * FROM fn_dblog(null, null)
    WHERE AllocUnitName like '%test3%';
    GO
    UPDATE test3 SET a=2;
    GO
    SELECT [Current LSN], Operation, Context, [Log Record Length] as Length, AllocUnitName
    FROM fn_dblog(null, null)
    WHERE AllocUnitName like '%test3%';
    GO

    Again, you’ll see only the one LOP_MODIFY_ROW, even though both the inserted and deleted rows were accessed in the trigger.

    image

    So it was obvious that I had to update my course material.

    In my class two weeks ago, I found out another old ‘fact’ that is no longer true, but that will have to wait until next time. And of course, a whole new version is coming out in just a few months. I wonder what old facts will no longer be facts?

    Have fun!

    ~Kalen

    p.s. Right after I published this, I noticed I had already written up the update-in-place with triggers, just about a year ago. So if you missed it then, you know about it now!

  • Geek City: SQL Server 2014 In-Memory OLTP (“Hekaton”) Whitepaper for CTP2

    Last week at the PASS Summit in Charlotte, NC, the update of my whitepaper for CTP2 was released.

    The manager supervising the paper at Microsoft told me that David DeWitt himself said some very nice things about the technical quality of the paper, which was one of the most ego enhancing compliments I have ever gotten! Unfortunately, Dr. DeWitt said those things at his “After-the-keynote” session, not in the keynote that was recorded, so I only have my manager’s word for it. But I’ll take what I can get.

    If you’d like to read the update, which has information about the new “range” indexes, natively compiled procedures, checkpoint files, recovery and garbage collection in addition to all the content from the CTP1 paper, you can get it here:

    SQL_Server_2014_In-Memory_OLTP_TDM_White_Paper.pdf

    There will be more enhancements made for RTM, including some graphics for the index maintenance operations on range indexes, plus best practice suggestions.  These enhancements may be in the form of a third paper, but they might be in book form. I’ll let you know!

    Thanks

    ~Kalen

  • Did You Know? There is a PASS Conference in Stockholm!

    And I’ll be there!

    For the first time EVER, I will not be speaking at or attending the US PASS Summit. However, I will be speaking at the PASS SQL Rally in Stockholm November 4-6. I am so excited!

    I’ll be giving a pre-con you can read about here:
    http://www.sqlpass.org/sqlrally/2013/nordic/Agenda/PreconferenceSeminars.aspx#

    And then I’ll speaking about SQL Server Hekaton (In-Memory Database) in a session on Tuesday.

    I hope to see many of you there.

    ~Kalen

  • Geek City: How old are my statistics?

    This post is basically to answer a question asked in class this week: How can we get the last statistics update date for ALL user tables in a database?

    After working on the query for a while, I realized that the new metadata function I posted about here can give you that info easily:

    SELECT object_name(sp.object_id) as object_name,name as stats_name, sp.stats_id, 
        last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter
    FROM sys.stats AS s
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE sp.object_id > 100;

    But for those of you not yet running SQL Server 2008R2 Service Pack 2 or SQL Server 2012 Service Pack 1 (as long as it’s 2005 or later), I’ll show you the other query I had worked out:

    SELECT schema_name(schema_id) AS SchemaName,  object_name(o.object_id) AS ObjectName,
        i.name AS IndexName, index_id, o.type,
        STATS_DATE(o.object_id, index_id) AS statistics_update_date
    FROM sys.indexes i join sys.objects o
           on i.object_id = o.object_id
    WHERE o.object_id > 100 AND index_id > 0
      AND is_ms_shipped = 0;

    There are a few slight differences in the output between the two queries, in addition to the fact that the query using sys.dm_db_stats_properties() adds the extra columns for number of rows sampled, the column modification counter, etc. The function does does  not return the schema_id or the type of object (e.g. is it a table or an indexed view). Both those pieces of information can be determined by joining with the sys.objects view, but I have left that as an exercise for the reader/student.

     Smile 

    I hope you find this useful!

    ~Kalen

  • Geek City: sp_cacheobjects for SQL Server 2012

    In a post about 4 1/2 years ago, I gave you my version of a replacement for the old pre-2005 pseudotable syscacheobjects. I called it sp_cacheobjects and created it as a view in the master database. With the sp_ prefix, the view can be accessed from any database.

    When testing this on SQL Server 2012, I noticed that I almost always got a lot more rows back than I was expecting. Even when I added a WHERE clause to limit the database to only the database I was working in, I STILL got way to many rows back. It turns out that in SQL Server 2012, SQL Server is frequently running background queries checking for both filetable and fulltext activites, and these background queries, that run in all databases, are cached like any other queries. To get the same kinds of results from this view that I got in SQL Server 2008, I needed to add a few extra filters, so I am including the modified version of my view here.

     

    -- Create a view to show most of the same information as SQL Server 2000's syscacheobjects
    -- This script has been updated for SQL Server 2012 to remove plans dealing with filetable and fulltext activities,
    -- and to also not return queries from system databases.

    -- by Kalen Delaney, 2012

    -- Feel free to remove those filters from the WHERE clause at the bottom
    USE master
    GO
    IF EXISTS (SELECT 1 FROM sys.views WHERE name = 'sp_cacheobjects')
        DROP VIEW sp_cacheobjects;
    GO
    CREATE VIEW sp_cacheobjects(bucketid, cacheobjtype, objtype, objid, dbid, dbidexec, uid, refcounts, 
                            usecounts, pagesused, setopts, langid, date_first, dateformat, status, lasttime, maxexectime, avgexectime, lastreads,
                            lastwrites, sqlbytes, sql)
    AS

                SELECT            pvt.bucketid, CONVERT(nvarchar(19), pvt.cacheobjtype) as cacheobjtype, pvt.objtype,
                                        CONVERT(int, pvt.objectid)as object_id, CONVERT(smallint, pvt.dbid) as dbid,
                                        CONVERT(smallint, pvt.dbid_execute) as execute_dbid, 
                                        CONVERT(smallint, pvt.user_id) as user_id,
                                        pvt.refcounts, pvt.usecounts, pvt.size_in_bytes / 8192 as size_in_bytes,
                                        CONVERT(int, pvt.set_options) as setopts, CONVERT(smallint, pvt.language_id) as langid,
                                        CONVERT(smallint, pvt.date_format) as date_format, CONVERT(int, pvt.status) as status,
                                        CONVERT(bigint, 0), CONVERT(bigint, 0), CONVERT(bigint, 0), 
                                        CONVERT(bigint, 0), CONVERT(bigint, 0),
                                        CONVERT(int, LEN(CONVERT(nvarchar(max), fgs.text)) * 2), CONVERT(nvarchar(3900), fgs.text)

                FROM (SELECT ecp.*, epa.attribute, epa.value
                            FROM sys.dm_exec_cached_plans ecp
                    OUTER APPLY sys.dm_exec_plan_attributes(ecp.plan_handle) epa) as ecpa
                       PIVOT (MAX(ecpa.value) for ecpa.attribute IN ([set_options],[objectid],[dbid],
                              [dbid_execute],[user_id],[language_id],[date_format],[status])) as pvt
                           OUTER APPLY sys.dm_exec_sql_text(pvt.plan_handle) fgs
             WHERE cacheobjtype like 'Compiled%'
             AND text NOT LIKE '%filetable%'
             AND text NOT LIKE '%fulltext%'
             AND pvt.dbid between 5 and 32766;

     

    I hope you find this useful!

    ~Kalen

  • Did You Know? What settings to always change

     

    A week ago, I taught my SQL Server 2012 Internals class to a great group of very interactive students. Even though a dozen of them were taking the class remotely, there were still lots of really great questions and and lots of discussion.

    One of the students asked if I could summarize all the settings that I recommended changing from the default, right out of the box. I said I’d try to put a list together by the end of the week, but I didn’t make it. So I said I would put it together and blog it.

    I think it sounded during the week like there were more changes than there really are. Going back through my notes, I only found three settings, all of them instance-wide configuration settings, that I recommend always changing. Of course, depending on your hardware, your workload and your data, you may make more changes. But the short list here contains the options that I always make sure are changed from the default on any system I am working on.  So here they are:

    1. Remote admin connections

    This option doesn’t apply to all remote administrative connections, despite the name, but only to connections made using the DAC (Dedicated Administrator Connection).  This option has a value of either 0 or 1, and 0 is the default. I recommend changing it to 1, which allow someone working at a machine other than the machine where your SQL Server is installed to make a DAC connection.  It might be too late to change it to 1 once you realize you need it! This value also needs to be set to 1 when you are connecting to a clustered SQL Server, which is always considered a remote connection.

    2.  Optimize for ad hoc workloads

    This also is a two-valued option, with a default of 0. I recommend changing it to 1.  There may be some very edge-case scenarios for leaving it at 0, but they’re pretty rare so you’re better off changing it to 1. In most cases, this can save you lots of memory, because single-use ad hoc query plans will now only use 300 bytes of memory instead of a minimum of 16K. Some plans use more, a lot more. I’ve seen SQL Server instances with tens of thousands of single-use ad hoc plans. I’ll let you do the math.

    3. Show advanced options

    By default, only about 16 of the configuration options are viewable and settable with sp_configure. If you want to see all your options, including ‘optimize for ad hoc workloads’, you need this option set to 1. The default is 0. 

     

    As a bonus, I’ll give you another list. These are options that usually should be left at the default value, so you should verify regularly that no one has changed them.  Two of them are instance-wide options, and two are database options.

    1. Auto create statistics and auto update statistics

    These are database options that have the default value of ON, and should be left that way. There may be cases you want to update statistics more often than auto statistics provides for, but that doesn’t mean you should turn the automatic updating off.  And there may be a few edge cases where you want to turn off the automatic update of statistics. Make sure you have a very good reason if you do so. But before you do, try turning on auto update statistics async to see if that helps whatever problems are leading you to consider turning automatic updates off. Also, take a look at the procedure sp_autostats. It allows to turn off automatic updates just for particular tables or indexes.  

    2.  Max degree of parallelism

    I’m not going to tell you what you should set this configuration option value to. There are a lot of recommendations out there, on other people’s blogs. But I’m just going to say make sure it is not set to 1 if you have more than one processor available to your SQL Server.  The default is 0, which means ALL SQL Server’s processors are available for parallel queries. In some (if not most?) cases, leaving it at the default is ok, but again, don’t set it to 1.  If you find particular queries do not perform well when run in parallel, you can use the MAXDOP hint for those queries. But don’t turn all parallelism off across the entire instance.

    3. C2 audit mode and common criteria compliance enabled

    Keep these configuration options set to 0 unless your business is mandated to have one of these options on. C2 audit mode has been deprecated but’s it still around for now. If you’re mandated to have these options enabled, you should know it. So otherwise keep these set to 0. You WILL notice performance degradation if you enable this.

    4. Autoshrink

    This is a database option that should never have been invented. Pretend it doesn’t exist, unless someone has set it to ON in one of your databases, then set it back to OFF. 

     

    So these are the options that I specifically call out in my class as having general best practice values. I talk about other options as well, but most of the others have an ‘it depends’ answer for what values you should use. And what ‘it depends’ on is what I spend a lot of my class talking about.

    Feel free to let me know if there are options on your list… that you always change, or always make sure are unchanged!

     

    Thanks

    ~Kalen

  • T-SQL Tuesday: What kind of Bookmark are you using?

    TSQL2sDay150x150

    I’m glad there is no minimum length requirement for T-SQL Tuesday blog posts, because this one will be short. I was in the classroom for almost 11 hours today, and I need to be back tomorrow morning at 7:30.

    Way long ago, back in SQL 2000 (or was it earlier?) when a query indicated that SQL Server was going to use a nonclustered index to get row pointers, and then look up those rows in the underlying table, the plan just had a very linear look to it. The operator that indicated going from the nonclustered leaf to the data row was called a ‘Bookmark Lookup’, and it just looked a simple, single operator. Those of us that did troubleshooting of query plans knew that it could hide a multitude of sins, but to many people it looked very innocuous.

    Then in the next version, that simple, single Bookmark Lookup Operator was replaced by something that looked like a JOIN! When I first saw query plans showing a JOIN when doing a nonclustered index lookup, I was almost distraught, but it turns out that was only because it was new and different. The more I thought about it, the more I realized it was a Good Thing.

    Some people might get confused because they think of a JOIN as an operation that finds matches rows between two tables. But in fact, a JOIN can be used to find matches between any two sets of rows. And in the case of a nonclustered index lookup, SQL Server is finding rows in the leaf level of a nonclustered index (the ones that meet your filter condition(s) ) , and then is finding the matching rows in the underlying table. Internally, this is a JOIN operation.

    But when we look at the plans, there are two different operators used to show the actual lookup into the underlying table.

    There is a RID Lookup, used when the table is a heap. The nonclustered index contains Row ID, or RID, values (composed of a File ID, a Page ID and a Slot/Row number on the page). This RID is then used to ‘match’ with rows in the underlying table that have the same RID.

    There is also a KEY Lookup, used when the table has a clustered index.  The leaf level of a nonclustered index contains pointers that are the key values for the rows being pointed to. So to find the matching rows in the table, SQL Server takes the clustered key value from the nonclustered index, and then searches for that value in the clustered index, following the clustered index from the root down to its leaf.

    Below are the two icons used for these operations. If I were to give you a quiz, and ask which operator indicates we are finding a row in a table directly using a RID value, and which indicates we are finding a row in a table using a clustered index key, which would you say is which?  Take a moment to think about it.

    image       image

    I don’t know about you, but when I just try to figure out these two icons, I think the one on the left looks like it should be the KEY Lookup, and the one on the right should be the RID Lookup.

    But if you check the page in Books Online, or if you check your own query plans, you see that they are backwards!

    Here is a plan showing a RID Lookup:

    image

     

    And here is a plan showing a KEY Lookup:

    image

    Fortunately, the graphical query plan tells you what kind of operation it’s performing, so you can just ignore the picture and read the words. But what’s the point of a graphical plan in that case?

    So are they really backwards? Did someone just make a mistake and link in the wrong file when compiling the SQL Server graphical query plan code? Or am I misinterpreting this?

    I still think graphical query plans are one of the best thing ever added to product, so I’m really not complaining, but I’m just sayin….

     

    Have fun, and Happy T-SQL Tuesday!

    ~Kalen

  • Geek City: SQL Server 2014 In-Memory OLTP (“Hekaton”) Whitepaper

    Tech Ed isn’t even over yet, and my new Hekaton Whitepaper is already generating questions I can’t answer yet!

    Here are the two questions that were in my inbox this morning:

    1. The whitepaper says “There are other wait types, such as waiting for a log write to complete.”
    Does this mean that writing to the transaction log  will be the main bottleneck of the system?

    Hey folks, the product isn’t even publicly available. How can I, or anyone, know what your main bottleneck will be? You do know that there is ALWAYS a bottleneck, right? Something has to be the limiting factor? But I can’t even begin to make a guess yet as to what kind of limiting factors will be in Hekaton, now called SQL Server In-Memory OLTP. The paper is about the Internals, and that’s what I researched. I did not do any performance testing. But, as I did mention in the paper, log writing is much more efficient for Hekaton tables than for the disk-based tables. So even if it IS the limiting factor, it will not be as limiting as it could potentially be for non-Hekaton tables.

     

    2.  The whitepaper says “Note that none of the LOB data types are allowed.” 
    We have some 75 columns  which are still ntext, so  it seems we are dead in the water even if we convert to nvarchar(max)?

    As you read in the paper, rows are stored completely differently, and not on the disk-based 8K pages. So there is no mechanism for special pages for LOB or row-overflow data. Also, you’re aware that there is extra overhead for managing and accessing the LOB data, and to make In-Memory OLTP as efficient as possible, those extra-large rows needing extra work are avoided.  And like in my previous answer, without seeing your complete design (no, please don’t send it to me Smile ) there is no way I can tell just how ‘dead in the water’ you might be. I tend to feel there would be a way to redesign your tables, and separate the LOB types (which should be converted to something other than the deprecated ntext soon, anyway) into their own tables. As you read in the paper,  In-Memory OLTP tables can be joined with disk-based tables.

    Please feel free to send more questions, but I most likely will NOT be able to answer every one in a blog post. There will be a update to the paper, with more internals details, for CTP2, and some of the questions will be answered there. Finally, the ultimate plan is a book, like my SQL Server Concurrency book, all about Hekaton… oops, I mean In-Memory OLTP.  Hopefully, the book will contain best practices suggestions gathered from actual research with In-Memory OLTP implementations, plus a big Q&A section to answer questions that weren’t answered elsewhere in the book.

    Thanks!

    ~Kalen

  • Geek City: Document more trace flags?

     

    It’s been over 5 years since I last blogged about trace flags, so it seems a post on the topic is in order.

    If a trace flag is undocumented, it means it is unsupported and not guaranteed. It can go away at any time, even from one service pack to the next. If a trace flag doesn’t work the way the person who told you about said it should work, there is no help for that. It is undocumented.

    Trace flags change SQL Server’s behavior. They are toggles, that can be turned on or off. When they are on, they force SQL Server to do things differently. Sometimes just a little bit differently, sometimes a LOT differently. Some are very harmless, such as 3604 which allows SQL Server to return output from certain undocumented DBCC commands to return output to the client. You need to use this flag if you’re going to use the DBCC PAGE command, which I talk about a lot in my writings and in my SQL Server Internals classes.

    However, there are other trace flags that change SQL Server behavior a LOT. The SQL Server developers at Microsoft create trace flags for lots of reasons, usually to force some sort of non-standard behavior. This might be so the developers can test the feature during development, or so that the support engineers can turn a feature off – either to confirm that the feature was causing a problem. These are usually not changes that you would want to make to your own SQL Server in a normal environment.

    But with so many undocumented trace flags, the question frequently arises “Why aren’t MORE trace flags documented?”

    Here are some of the reasons:

    -- Once a trace flag is official publicly documented, people will start to use it. And people will also start to misuse it. Microsoft needs to consider whether dealing with the misuse will be worse than the problems due to the lack of that particular trace flag.

    -- Before a trace flag is documented, it must be thoroughly tested. FULLY. Across all SKUs, all security environments, all performance conditions. And that is not easy. A trace flag added for one particular situation or one rare bug is usually not tested to the full extent that documented features are. So it will stay undocumented.

    -- Once a trace flag is added, it must be supported, into the future. And can’t be removed without officially deprecating it, at least one full version in advance. Some trace flags are known to have a limited potential usefulness, documenting them will force them to be maintained even when they’re no longer useful.

    -- Because most of the undocumented trace flags are created for a limited set of circumstances, they are not tested in conjunction with other trace flags. In order to be fully tested, each trace flag would have to be tested in combination with every other trace flag. And if a particular combination of trace flags was not pretty, which of the trace flags involved should be left undocumented? It would be really hard to tell, even after spending all the time on all the testing.

    The bar for Microsoft actually documenting a trace flag is therefore quite high. And sometimes if a feature introduced by trace flag is really needed, it will just become part of the product, or added using a different interface, such as a configuration option or database property.

    But Microsoft does document new trace flags. If you look at the history of Books Online to see the number of documented trace flags, you can see the increase:

    --  SQL Server 2005:               10 trace flags

    -- SQL Server 2008:                13 trace flags

    -- SQL Server 2008 R2:           15 trace flags

    -- SQL Server 2012:                19 trace flags

     

    So if you read about an undocumented trace flag, and you think it might be useful for you, be very careful when testing it. And if your careful testing shows it IS useful, let Microsoft know.

    But if your testing shows it isn’t useful, or is actually harmful, you don’t need to let Microsoft know. They probably already know. After all, it’s undocumented.

     

    ~Kalen

  • Geek City: More statistics info available!

     

    I just started using a new DMV (one that’s actually an ‘F’ not a ‘V’, as in Function) that gives us more info about distribution statistics. It returns info about the last statistics update date (which is also available with a function STATS_DATE()). It also provides the number of rows sampled when the statistics were last updated. This is available through DBCC SHOW_STATISTICS, and in a blog post a couple of months ago, I showed you how to get that output into a table for your own querying. One of the coolest things this new DMV shows is the row modification counter, which used to be available in sysindexes as a count of rows changed, but in SQL 2005 and later, SQL Server keeps track of changes to each column that has statistics on it. These values were not visible before, but now they are! This new object also reports any filter definition and includes rows for all statistics, whether index statistics or column statistics.

    As a table valued function, sys.dm_db_stats_properties can be used with the CROSS APPLY operator to give information for all statistics in a database, or you can filter to just return  user objects and not any system objects, as shown in the code here:

    SELECT
       sp.object_id, object_name(sp.object_id) as object_name,sp.stats_id, name as stats_name,
        filter_definition, last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter
    FROM sys.stats AS s
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE sp.object_id > 100;

    The new DMV was easy to miss because it wasn’t added until SQL Server 2008R2 Service Pack 2 and SQL Server 2012 Service Pack1. But since you should always be running with the latest service pack, you should have this great DMV if you’re any version after SQL Server 2008.

    Let me know if you find this DMV useful, and what you mainly use it for.

    Have fun!

    ~Kalen

  • Geek City: A Grammar Geek in the Cloud

    Those of you who know me well know that I am usually a stickler for spelling, grammar and proper word usage. I may have even lost a few friends because of what some people see as obsession. I am not infallible, and I do make typos, but I like to try to correct them if possible as soon as I discover them (or as soon as they are brought to my attention.)

    So now I will admit that I made a mistake in my usage of of the word ‘premise’, when talking about Cloud vs. non-Cloud databases. I was using the term on-premise as the opposite of ‘in the Cloud’, and my friend Cindy corrected me last week in no uncertain terms. So I went online and quickly found this post that proved she was right and I was wrong:

    http://fbhalper.wordpress.com/2009/06/11/premise-vs-premises-in-the-cloud/ 

    I did a bit more searching and found that I am not the only one making this mistake. A Research VP at Gartner actually makes the same mistake in a published post:

    http://blogs.gartner.com/chris-wolf/2010/07/14/on-premise-microsoft-azure-an-inevitable-milestone-in-azure%e2%80%99s-evolution/

    I will admit that I find the word ‘on-premises’ awkward to say, so when I’m talking casually I might end up leaving off the last syllable.

    But anytime I’m writing, I promise to use premise vs. premises appropriately.

    And I expect the same from everyone else.

    Smile

    ~Kalen

  • Geek City: Join With Me!

    I remember one of the most surprising changes in SQL Server 2000 2005 was how the graphical plans showed the use of a nonclustered index to seek, and the plan included something that looked like a JOIN to find rows in the base table.  Here’s an example. Although I used SQL Server 2008R2, the graphical plan will be pretty similar to what it looked like back in SQL Server 2000 2005. My code will make a copy of a table in the AdventureWorks2008 database, and then build an index on one of the columns.

    USE AdventureWorks2008;
    GO
    IF object_id('dbo.Sales') IS NOT NULL
        DROP TABLE dbo.Sales;
    GO
    SELECT * INTO dbo.Sales FROM Sales.SalesOrderHeader;
    GO
    CREATE INDEX Sales_SalesPersonID_index on dbo.Sales(SalesPersonID);
    GO

    Now look at the estimated graphical execution plan for this query, that searches for a particular value for the SalesPersonID column:

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 280;
    GO

    You should see something like this:

    image

    It shows a JOIN even though there is only one table involved! What’s with that? It took me a few minutes to figure out what I was seeing, and I must admit that over the years since then I have come to appreciate this graphical representation. You’ll have a problem if you think of a JOIN as only finding matching rows between two tables, but can be solved if you realize that a JOIN can be used when finding matches between any two ROWSETS, i.e. any two sets of rows. There is a set of rows in the nonclustered index on SalesPersonID, and a set of rows in the table. The index seek finds all the index rows with a SalesPersonID value of 280, but those index rows only contain an index key value and a row pointer (RowID or RID). Because the query is requesting all the columns in the table, not just the SalesPersonID value,  SQL Server must find the rows in the dbo.Sales table that have a matching value for the RowID. Internally, it is doing a join, even though there is only one table. It is joining index rows with table rows.

    If the table had a clustered index, the plan would look almost exactly the same, with one difference. In SQL Server, if a table has a clustered index, nonclustered index rows do not contain RowIDs, they contain the clustered key value to be used a pointer to the row. SQL Server can take this clustered key from the nonclustered index row and look up in the clustered index to find the desired row. So I create a clustered index on SalesOrderNumber:

    CREATE UNIQUE CLUSTERED INDEX Sales_ident on dbo.Sales(SalesOrderNumber);
    GO

    Now when I run the same query as above, I get the following plan:

    image

    The only difference in the two plans above is the icon for the lookup into the base table.  One is looking up into a heap using a RowID, which is an address containing the File ID, the page number and the slot or row number on the page. The other icon represents taking a key value from the nonclustered index row and looking it up by seeking through the clustered index. If you look again at these two icons, which one looks like a picture of directly addressing a row, and which looks like a picture of following another index into the table itself?

    image

    The icon on the left is used for a RID lookup, and the icon on the right is used for a clustered index key lookup. I know that nobody asked me, but I think they got these backwards.

    But wait, there’s more…

    Prior to SQL Server 7, there was basically a rule of thumb with query plans that SQL Server would use one index (at most) per table per query. It could never (well, hardly ever) use two indexes on the same table in a single query. That changed in SQL Server 7 when a new algorithm called Index Intersection was introduced.

    I’m going to build another index on SalesOrderDate:

    CREATE INDEX Sales_OrderDate_index on dbo.Sales(OrderDate);
    GO

    This index will be used if I look for all orders place on July 1, 2002. The query plan for the following query is shown:


    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01';

    image

    We saw that the index on SalesPersonID is useful when looking for the value 280, and the index on SalesOrderDate is useful when looking for '2002-07-01'. But what if one query looks for rows that satisfy both conditions?

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2002-07-01'
    AND SalesPersonID = 280

    Here’s the plan:

    image

    We see two JOINs ! First, SQL Server finds the sets of index rows meeting each of the WHERE filter conditions and those two index row sets are joined together, using the clustered key value as the join column. Then only after SQL Server finds rows that meets both conditions, it takes the common clustered index key and uses it to find the row in the underlying table.

    But wait, there’s more!

    Both of these queries use a table scan, because the filter conditions are not selective enough. There are way more rows with a SalesPersonID value of 289 than there were for a value of 280, and there are more rows with an OrderDate of March 1, 2004 than there are with an OrderDate of July 1, 2002.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01' ;
    GO

    SELECT * FROM dbo.Sales
    WHERE SalesPersonID = 289;
    GO

    Both of the above queries show the same plan. Because the table now has a clustered index, the plan shows a clustered index scan, but it is exactly the same work as a table scan:

    image

    However, if you look at the plan for a query that uses both filter conditions, we don’t get a clustered index scan.

    SELECT * FROM dbo.Sales
    WHERE OrderDate = '2004-03-01'
    AND SalesPersonID = 289;

    image

    Even though each filter alone is not very restrictive, they can both be used for index intersection. The reason SQL Server usually doesn’t choose index seeks on non-restrictive filters is because the cost of then accessing the base table for all the qualifying rows can be very high. But even though each individual index returns a lot of rows, the optimizer’s estimate of the number of rows that meet BOTH conditions is very small, and therefore the number of lookups into the base table is quite small and affordable.

    So, what can you learn? Just because you think an column might not be selective enough to build a nonclustered index on, don’t discount it.  If that column will frequently be used in queries with multiple conditions, it just might turn out to be more useful than you thought!

    So make sure you test many possible choices of indexes and combinations of indexes before you know what the best choices are.

    Have fun!

    ~Kalen

  • Geek City: Accessing Distribution Statistics

    Distribution statistics are one of the most important sources of information that the Query Optimizer uses to determine a good query plan. In this post, I’m not going to tell you everything about distribution statistics. I’m just going to show you a few tricks for getting access to the statistics.

    If you want a deeper understanding of what the statistics keep track of, and you don’t have any of my SQL Server Internals books handy, check out this whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 

    Microsoft does provide us a tool called DBCC SHOW_STATISTICS for examining the distribution statistics.

    Microsoft has gradually been making more of the more of the old DBCC commands available as DMVs, even some undocumented ones. For example, one of my favorites, DBCC IND, has now been replaced in SQL Server 2012  by sys.dm_db_database_page_allocations.

    I have been wishing for several versions that Microsoft would make the DBCC SHOW_STATISTICS information available as a DMV. But it hasn’t happened yet, and I’m tired of waiting, so I decided to do something about it.

    My solution is not quite as easy to use as a DMV might be, but it allows you to get the information that DBCC SHOW_STATISTICS provides into a set of three tables that can then be saved into a more permanent location of your choice, and/or queried as desired.

    DBCC SHOW_STATISTICS returns three sets of information, with different columns in the output, so three different tables are needed. The DBCC SHOW_STATISTICS command can be called with an argument that specifies that you just want one of the three sets returned. The options are

    WITH STAT_HEADER – returns basic info such as last update date, and number of rows in the table/index. Also reports number of steps returned for HISTOGRAM section.

    WITH DENSITY_VECTOR – returns density info for each left-based subset of columns in the index. For example, an index on (lastname, firstname, city) would have a density value for (lastname), for (lastname, firstname), and for (lastname, firstname, city). Each density value is a single number representing the average number of occurrences and depends on the number of distinct values. For example, if there are only 2 possible values in the column, the density would be 0.5. Multiplying density by the number of rows in the STAT_HEADER section would give the average expected rowcount if a query was executed looking for an equality on the specified column(s).

    WITH HISTOGRAM – returns a set of ordered values from the first column of the index, creating a histogram. This histogram provides the optimizer with selectivity information for specific values or ranges of values in the first column of the index.

    To collect this info, I will use one of my favorite tricks, which is to create a table in the master database with a name starting with sp_. (I’ve written about this trick several times, including in this earlier blog post.) Once I have the table(s) created, I can access them from any database. So here are the three tables:

    USE Master;
    GO

    IF  (SELECT object_id('sp_stat_header')) IS NOT NULL
      DROP TABLE sp_statsheader;
    GO
    CREATE TABLE sp_stat_header
    (   Name sysname,
        Updated datetime,
        Rows bigint,
        Rows_sampled bigint,
        Steps smallint,
        Density numeric (10,9),
        Average_key_length smallint,
        String_index char(3),
        Filter_expression nvarchar(1000),
        Unfiltered_rows bigint);
    GO


    IF  (SELECT object_id('sp_density_vector')) IS NOT NULL
      DROP TABLE sp_density_vector;
    GO
    CREATE TABLE sp_density_vector
    (  all_density numeric(10,8),
       average_length smallint,
       columns nvarchar(2126) );
    GO

    IF  (SELECT object_id('sp_histogram')) IS NOT NULL
      DROP TABLE sp_histogram;
    GO
    CREATE TABLE sp_histogram
    (   RANGE_HI_KEY sql_variant,
        RANGE_ROWS bigint,
        EQ_ROWS bigint,
        DISTINCT_RANGE_ROWS bigint,
        AVG_RANGE_ROWS bigint);
    GO

    The second trick is to use INSERT … EXEC to execute a DBCC statement and populate the tables. I will build the DBCC  command dynamically, after capturing the schema, table and index names in variables. You of course could take this code and turn it into a stored procedure, for which the schema, table and index names are passed as parameters. I’ll use as an example a table in the AdventureWorks2008 sample database, just so you can try running the code, and I can verify that it actually works!

    I will use the table Sales.SalesOrderDetail and the index IX_SalesOrderDetail_ProductID. So the object name (@oname) is SalesOrderDetail, the schema name (@sname) is Sales, and the index name (@iname) is IX_SalesOrderDetail_ProductID.

    SET NOCOUNT ON;
    USE AdventureWorks2008;
    GO
    DECLARE @oname sysname,  @iname sysname, @sname sysname

    SELECT @oname = 'SalesOrderDetail',  @sname = 'Sales', @iname = 'IX_SalesOrderDetail_ProductID';
     
    -- Update the object name to include the schema name, because that is the format the DBCC command expects
    SELECT @oname = @sname +'.' + @oname;

    TRUNCATE TABLE sp_stat_header;
    INSERT INTO sp_stat_header
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH STAT_HEADER');

    TRUNCATE TABLE sp_density_vector;
    INSERT INTO sp_density_vector
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH DENSITY_VECTOR');

    TRUNCATE TABLE sp_histogram;
    INSERT INTO sp_histogram
        EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH HISTOGRAM');

    So now you can look at the values collected and filter or query in any way, or use SELECT INTO to save them into another table, so the sp_ tables can be used the next time you want to capture distribution statistics information.

    SELECT * FROM sp_stat_header;

    SELECT * FROM sp_density_vector;

    SELECT * FROM sp_histogram;

      

    Let me know if you find this useful, and especially if you embellish it to create a procedure or an automated process of your own!

    Thanks!

    ~Kalen

  • Geek City: What Triggered This Post?

    I’d really like to get another post up onto my much neglected blog before the end of 2012. This will also start one of my New Year’s resolutions, which is to write at least one blog post a month. I’m going to tell you about a change in SQL Server that wasn’t announced in any “What’s New” list that I ever saw, perhaps because it was just a chance in internal behavior, and nothing that required any change in user applications.

    Do you retest what you know is true for every new version? When I update my books, I do test all the scripts, but if there isn’t a script, I don’t retest every ‘fact’ that I have known for years is true. And sometimes, things change. And sometimes my reviewers notice those unreported changes, and sometimes they don’t.

    You might be aware of the fact that SQL Server can perform UPDATE operations in two different ways. The UPDATE can be performed as a two-step process: delete the old row and then insert a whole new row, or, the UPDATE can be performed (much more efficiently) as an update-in-place.  When the two-step UPDATE is performed, it is a LOT more work. Not only does SQL Server have to log the entire old row and the entire new row, but each nonclustered index is also modified twice, and each of those index changes also has to be logged. So it’s nice when an update-in-place is done, because only the bytes changed are logged, and only indexes on the updated columns are affected.

    Prior to SQL Server 7, there were actually four different ways that UPDATE could be done. The two-step UPDATE had some variations that could make it even slower in some cases! But that was a long time ago, so I’m not going to go into the details now. But I will say that back then, in order to get an update-in-place to occur, there was a big long list of prerequisites that had to be met and if you missed just one, you’d get one of the slower UPDATE operations.

    As of SQL Server 7, update-in-place became the default. The only time it doesn’t happen is when the row can’t stay in the same location (such as when you update a clustered index key column) or when SQL Server really needs the old and new versions of the row.

    In SQL 7, one of the places that SQL needed the old and new version of the updates rows was when processing triggers. Triggers need the transaction log to get the contents for the DELETED and INSERTED pseudo-tables. And because triggers needed the entire old and new versions of the updated rows, the UPDATE was performed as a two-step operation. DELETE the old row, log the entire old row, and the INSERT the new row with the new values, and log the entire new row.

    But as of 2005, we now have the version store, primarily used for SNAPSHOT isolation, but available for other uses as well. In SNAPSHOT isolation, the version stores stores ‘old versions’ of rows that have been updated or deleted.  I knew that the version store was also used for triggers, but it only occurred to me just recently that maybe, because the old and new versions of the row were not needed from the log, perhaps UPDATEs did not always need to be performed internally as a two-step UPDATE.

    So I decided to test it out.

    -- DEMO: If there is an UPDATE trigger, are updates logged as DELETE + INSERT?
    -- First build a new database.

    USE master;
    GO
    IF (SELECT db_id('TestTrigger')) IS NOT NULL
        DROP DATABASE TestTrigger;
    GO
    CREATE DATABASE TestTrigger;
    GO
    ALTER DATABASE TestTrigger SET RECOVERY SIMPLE;
    GO
    SELECT db_id('TestTrigger');
    GO

    USE TestTrigger;
    GO

    -- Just for a warmup, look at the function fn_dblog, which works in the current database

    SELECT * FROM fn_dblog(null, null);
    GO

    -- Create a new table to work with
    IF (SELECT object_id('objects')) IS NOT NULL
        DROP TABLE objects;
    GO
    SELECT TOP 100 * INTO objects FROM sys.objects;
    GO

    -- Create a clustered index on the table
    CREATE CLUSTERED INDEX objects_clustered on objects(name);
    GO

    -- First examine an update we know is NOT done in place,
    -- i.e. updating a clustered key value

    UPDATE objects SET name = 'newrowsets' WHERE name = 'sysrowsets';
    GO

    -- Look at last 10 rows; notice a LOP_DELETE_ROWS and LOP_INSERT_ROWS
    -- The AllocUniteName column shows the object affected is the clustered index on dbo.objects
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);
    GO

    -- Now examine an update we know is  done in place,
    -- i.e. updating an unindexed column on a table with no triggers
    UPDATE objects SET parent_object_id = 1 WHERE name = 'sysfiles1';
    GO

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW on the dbo.objects allocation unit
    SELECT Operation, [Transaction ID], AllocUnitName FROM fn_dblog(null, null);
    GO

    -- Create an update trigger
    -- Will the update be done with the siple LOP_MODIFY_ROW or with the LOP_DELETE_ROWS and LOP_INSERT_ROWS
    CREATE TRIGGER trg_update_objects ON objects FOR UPDATE
    as
    SELECT * FROM DELETED; SELECT * FROM INSERTED;
    RETURN;
    GO

    -- Now perform update again
    UPDATE objects SET parent_object_id = 10 WHERE name = 'sysfiles1';
    GO

    -- Look at last 3 rows; notice a LOP_MODIFY_ROW
    SELECT * FROM fn_dblog(null, null);
    GO

    Since the database is in SIMPLE recovery model, you can issue a CHECKPOINT before each UPDATE if you want to reduce the number of rows in the log to make it easier to examine.

    So it seems that I need to update my course and some of my writings. There might also be special cases that still require that an two-step UPDATE be performed in the presence of triggers, but it seems like a two-step UPDATE is not ALWAYS required anymore. That is very good news!

    I hope you all have a wonder-filled and joyous New Year!

    ~Kalen

More Posts Next page »

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement