THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Kalen Delaney

  • Did You Know: Early Spring Warm Up in Florida!

    No, this is not an early response to the next T-SQL Tuesday topic. Rob Farley will be hosting that one, and wants us to write about relationships.

    So getting warm doesn't refer to a nice cozy relationship by the fire, while the snow is relentless in Chicago and the world is frozen in the Northeast. Not to mention Scandinavia, where I'll be going in April. Of course, Rob is from the Southern Hemisphere, so the seasons are backwards and he's not thinking of a warm fire in any case.

    You can come to Southern Florida the first week of Spring to take my SQL Server Internals class! I am so looking forward to this. I'll be staying for the weekend after the class and driving down to Key West on the Overseas Highway, something I've wanted to do ever since I saw Jamie Lee Curtis dangling from a helicopter above it, in True Lies.

    Information about the class can be found here: http://www.eventbrite.com/event/433310042

    I hope to see you there!

    ~Kalen


  • Geek City: More About Row Overflow Columns

    In my earlier post about the maximum number of row-overflow columns, I just got a comment from someone wanting to know WHY they couldn't get the number I reported… that is, 308 long varchar columns, of which 307 will be stored off page in row-overflow pages and one will be stored on the in-row data page. There are many reasons you might not be able to get the maximum, but 308 is the most! If you can only get 304, it might be because of other columns in the row that take up extra space. You can get the maximum number when all you have in the row is these row overflow columns. The byte usage is described in the referenced post.

    Here is the script I used to get 308 columns. Change the 308 to 309, and you'll see that the table creation will succeed, but the INSERT of 309 values will fail.

    USE testdb;
    GO

    DECLARE @create varchar(max);
    DECLARE @tabname sysname;
    DECLARE @numcols int;
    DECLARE @col int;


    -- You can change the total number of columns
    SELECT @numcols = 308; -- change this number to specify a different number of columns

    -- Generate the table name
    SELECT @tabname = 'wide' + CONVERT(varchar, @numcols);

    -- Start building the CREATE TABLE statement
    SELECT @create = 'CREATE TABLE ' + @tabname +
          ' (col1 varchar(8000) default replicate(''a'', 8000), ';

    -- Start the loop to add columns
    SELECT @col = 2;
    WHILE @col < @numcols BEGIN

           SELECT @create = @create + 'col' + CONVERT(varchar, @col) +
                 ' varchar(8000) default replicate(''a'', 8000),';
           SELECT @col = @col + 1;
    END;

    -- Add the last column to the CREATE TABLE script
    SELECT @create = @create + 'col' + CONVERT(varchar, @col) +
                 ' varchar(8000) default replicate(''a'', 8000));';


    -- PRINT @create  -- Uncomment to display the create statement if desired

    -- Execute the CREATE TABLE statement
    EXECUTE (@create) ;

    EXEC ('INSERT INTO '+ @tabname + ' DEFAULT VALUES'); -- insert 1000 bytes into each column


    --  EXEC ('SELECT * FROM '+ @tabname);  -- Uncomment to SELECT the row, if desired

    -- Verify that the table exists
    SELECT name FROM sys.tables
    WHERE name LIKE 'wide%' ;

    -- Look at the count of columns
    SELECT OBJECT_NAME(object_id) as table_name, COUNT(*) column_count
    FROM sys.columns
    WHERE OBJECT_NAME(object_id)  LIKE  'wide%'
    GROUP BY OBJECT_NAME(object_id);

    --EXECUTE ('DROP TABLE ' + @tabname); -- uncomment to drop the table at the end of the script, if desired

    Have fun!

    ~Kalen


  • Did You Know: My Online Seminar is Scheduled!

    I mentioned here that I had recorded my first seminar with SSWUG.

    It has been scheduled for February 12th, starting 9AM Pacific Time, and is over 3 hours of content. I will be available on live chat during the seminar to answer questions.

    Registration info is here:

    http://www.vconferenceonline.com/shows/workshops/indexing.asp

    There will be a drawing at the end for copies of my books: SQL Server 2008 Internals and SQL Server MVP Deep Dives.

    I hope to see you there!

    Thanks,

    ~Kalen


  • Did You Know? Three events that brought me here

    I was tagged by Paul Randal, to tell you about three events that brought me to this point in my life today. It's interesting timing, because one of my teachers was just talking about a book he had just read, called "The Tipping Point", which dealt with people in our lives who had made things happen for us.  It seemed to me there was some sort of cosmic connection, to I started thinking a lot about how I would address Paul's question.  However, the more I started thinking, the harder it got. Every major decision in my life was brought about by previous decisions and previous events.

    And then Tuesday I was told by editor at SQL Server Magazine that there had been some major confusion and my monthly commentary was due this Wednesday (Jan 20) not next (Jan 27). So I decided to turn my answer to Paul's question in to my commentary, since I had already spent so much time thinking about it.

    I mainly considered 'events' to be decisions I had made, rather than things that just happened to me. One of those decisions was a very spur-of-the-moment one, another was a very heart-wrenching and difficult personal decision, and the third was a professional decision that many people thought I was crazy for making.

    You can read the read of the post here:

    http://www.sqlmag.com/Articles/Index.cfm?ArticleID=103475

    So now you know --

    ~Kalen


  • T-SQL Tuesday #002: Non-aggregated columns in a GROUP BY query

    It's time for the second T-SQL Tuesday, once again managed by Adam Machanic. Since my first T-SQL Tuesday post dealt with my long years with the SQL Server product, I decided to continue in that vein.  Adam asked for us to write something about a puzzle we had to deal with, involving TSQL. I decided to write about one of the most puzzling behaviors of the TSQL language, that fortunately went away many versions ago….

    When I started working with Sybase, over 22 years ago, I was hired as a Technical Support Analyst. The company was new and small, and didn't have a lot of internal training procedures in place, so I was basically just given the manuals to read for a couple of weeks before being put on the phones with customers.

    My only knowledge of SQL and relational database systems was from a one semester course I had taken at UC Berkeley, taught by Michael Stonebraker. But, at least that was something, better than nothing.  I remember reading the TSQL manual, practically cover to cover, and encountering a very strange feature of queries using GROUP BY. I didn't completely understand it, but it seemed very strange. But I just kept reading, because I would have to face my first customer call… very soon. I remember that very first call, because I actually was able to provide some support. The caller was puzzled by the behavior he was getting from a certain TSQL query, the problem description sounded very much like something like the mysterious GROUP BY behavior I had read about just a day or two before.  And I was able to solve his problem that day, on my very first call.

    As good as my memory for SQL Server trivia is, I don't remember the exact query that the customer was calling about. (I do remember the customer though, as we went on to work together on many more interesting problems, and I have recently reconnected with him on LinkedIn.) Although I don't remember the exact query, I remember the behavior. It had to do with a TSQL 'extension' to the standard, that allowed you to use columns in a SELECT list containing aggregates, that were not aggregates and not part of the GROUP BY. Whew…

    So let's see what I am talking about. I'm going to use the pubs sample database, which is the only one we had 22 years ago, and all sample queries in all the documentation were written to use it. I know that data very well.  There is a table called titles, and each book in the titles table has an attribute called type, as well as title, price, and a few others. The following query is typical of the use of aggregation and GROUP BY. It takes all the rows with the same value for type, and produces one row of output for each distinct value of type. The average is the average per set of rows, one value per group of rows with the same type.

    USE pubs;
    GO
    SELECT type, AVG(price) as avg_price
    FROM titles
    GROUP BY type;

    RESULTS:
    type         avg_price
    ------------ ---------------------
    business     13.73
    mod_cook     11.49
    popular_comp 21.475
    psychology   13.504
    trad_cook    15.9633
    UNDECIDED    NULL

    Way back in SQL Server 3 and 4 (and 4.2 and 4.21) we could add additional columns to the SELECT,  so I could add the title column, or the first 10 characters of the title, to the query.  We would get the results shown after the query.

    SELECT
      SUBSTRING(title, 1, 10) as name,
        type, avg(price) as avg_price,
        count(*) as number
    FROM titles
    WHERE  type like '%cook%'
    GROUP BY type

    RESULTS:
    name       type         avg_price   number        
    ---------- ------------ ----------  ------
    Silicon Va mod_cook     11.49       2             
    The Gourme mod_cook     11.49       2              
    Onions, Le trad_cook    15.96       3              
    Fifty Year trad_cook    15.96       3              
    Sushi, Any trad_cook    15.96       3

    So first, you might wonder what it means to add the extra column. Once the rows are grouped by type, adding some additional column basically undoes the grouping. Now I get individual data values instead of aggregates (summaries).  Except that the avg_price and number do seem to be summaries. It might almost start to make sense, until we add an additional WHERE filter:

    SELECT
    SUBSTRING(title, 1, 10) as name,
    type, avg(price) as avg_price,
    count(*) as number
    FROM titles
    WHERE  type like '%cook%'
      AND title LIKE 'S%'
    GROUP BY type

    RESULTS:
    name       type         avg_price   number     
    ---------- ------------ ----------- -----------
    Silicon Va mod_cook     19.99       1          
    The Gourme mod_cook     19.99       1          
    Onions, Le trad_cook    14.99       1          
    Fifty Year trad_cook    14.99       1          
    Sushi, Any trad_cook    14.99       1
      

    Both the average and the count changed when we added the additional condition, but the data that comes back does not all satisfy the condition… there are titles that do NOT start with 'S'! So what's going on?

    It turned out that once we broke the ANSI rules, and added a column to the SELECT list that was not part of an aggregate function and was not contained in the GROUP BY, then our WHERE clause changes its meaning. In this case, the WHERE was only used to determine what rows were used to compute the aggregates. This is why the values for avg_price and number changed with the addition of the second condition. If we wanted to control which rows come back, we needed to specify the condition in a HAVING clause.

    SELECT
    SUBSTRING(title, 1, 10) as name,
    type, avg(price) as avg_price,
    count(*) as number
    FROM titles
    WHERE  type like '%cook%'
    GROUP BY type
    HAVING title LIKE 'S%'

    RESULTS:
    name       type         avg_price   number          
    ---------- ------------ ------------ -------
    Silicon Va mod_cook     11.49        2          
    Sushi, Any trad_cook    15.96        3

    This sort of seems like the results I might expect. If this behavior is puzzling to you, imagine trying to explain it to customers and students new to SQL. It was actually my good friend Tibor Karaszi who ran all these queries for me on a SQL Server 4.21 that he had available, so I could verify the results. After being totally baffled by the behavior, he said: “I’m glad we don’t have to handle such topics in the newsgroups anymore!”

    Even though we no longer can add unaggregated, ungrouped columns to the SELECT list of an aggregate query, that doesn't mean the puzzles are all gone. One problem that I continued to encounter when teaching beginning TSQL classes, was WHY this query isn't legal:

    SELECT title, type, count(*) as number
    FROM titles;

    SQL Server 2008 gives you a somewhat helpful error message:

    Msg 8120, Level 16, State 1, Line 1
    Column 'titles.title' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    Just a few versions ago, the message only reported that the titles.title column was invalid, with no explanation. In Books Online, the explanation was found in the section on GROUP BY. But if someone worte the preceding query, they didn't even use a GROUP BY and may have no idea there was any relationship of this query to grouping operations. It made for a very puzzling time, to try to figure out this query didn't work as expected.

    Hopefully now there are fewer puzzling behaviors… but we'll just have to wait and see what everyone else blogs about.

    Have fun (and THANKS Tibor!)

    ~Kalen

    (posted Monday evening on the US West Coast, but already Tuesday UTC time.)


  • Did You Know? I'm doing online seminars!

    I filmed my first seminar at Stephen Wynkoop's incredible studio in Tucson last Tuesday. It was a terrific experience; Stephen just made it seem so easy, and he has the perfect setup for doing the filming.

    The seminar "Index Internals and Best Practices" was supposed to be 2.5 - 3 hours long, but ended up being over 3.5 hours! The tentative broadcast date is February 12, and I'll be live on chat to answer questions during the presentation. A DVD of the seminar will also be available.

    I'll post more specifics as they are available, but in the meantime, you can check out this interview I did with Stephen right after the filming:

    http://www.sswug.org/media/

    Thanks!

    ~Kalen


  • Did You Know? Yet Another Gift From Manning

    Here's another one-day-only offer, for Monday, December 28th:

    $15 off any purchase

    This includes MEAP (Manning Early Access Program), ebooks or print books from Manning's catalog:
    http://www.manning.com/catalog/by/subject/

    And of course it includes the SQL Server MVP Deep Dives book! Again, here's the Manning site for that book:
    http://www.manning.com/nielsen/

    - Use code dotd1228 http://bit.ly/83ShZv

    Happy New Year!

    ~Kalen


  • Did You Know? A Holiday Gift From Manning

    Christmas Day Only!

    Dec 25: Season's Savings from Manning - 50% off everything!

    This includes MEAP (Manning Early Access Program), ebooks or print books

    And of course it includes the SQL Server MVP Deep Dives book! Here's the Manning site for that book:
    http://www.manning.com/nielsen/

    Use code dotd1225 http://bit.ly/8Hi00H to get your 50% discount.

    Ho Ho Ho

    ~Kalen


  • Geek City: Exploring the Transaction Log Structure

    As I mentioned last October, my PASS preconference seminar was a whole day about the transaction log. I told the attendees all kinds of things about the structure of the log, and told them about one of the best tools for exploring the structure: the command DBCC LOGINFO. This command is officially undocumented, but numerous bloggers have discussed it in some detail, including here. I'm not going to go into all the details here, but I'll be writing some related blog posts over the next month or so.  To understand DBCC LOGINFO you need to know about VLFs (Virtual Log Files) and one of the best places to read about VLFs is in Paul Randal's TechNet article. The DBCC LOGINFO command returns one row per VLF including the following columns:

    FileId

    If you only have one physical log file, this value will be the same in every row. If you have multiple log files, you can use this value to explore the order that the SQL Server will fill up VLFs in multiple files. (More details in a future post.)

    FileSize 

    This value is in bytes. If you have your log file growth set to a percentage (which, unfortunately is the default), you should notice this value getting progressively larger as more VLFs are added to your log file.

    StartOffset

    This value is also in bytes, and is used as the sort column for the output. That is, the output from DBCC LOGINFO (if you only have single physical file) will be ordered by StartOffset. Notice that the first VLF always starts at offset 8192, which is the number of bytes in a page. The first page of the physical file is the file header, and does not store log records.

    FSeqNo

    The File Sequence Number indicates the order of usage of the VLFs. The row with the highest FSeqNo value is the VLF where current log records are being written. Notice that the FSeqNo values are not always consistent with the StartOffset, as VLFs are reusable after the log has been truncate. I'll discuss this value more in a future post, or you can get some info from Paul's article on how and when a VLF is reused. A value of 0 for the FSeqNo means that this VLF has never been used at all yet.

    Status

    Status has two possible values: 0 and 2. I'll be discussing these values in a lot more detail later. For now, the simple definition is that a value of 2 means the VLF is not reusable and a value of 0 means it is.


    Parity

    Parity has two possible values: 64 and 128. Every time a VLF is reused, the parity value is switched. Paul Randal discusses the parity value in more detail in this article.
     

    CreateLSN

    This value indicates when the VLF was created. If the CreateLSN is 0, it means the VLF was original equipment, created when the database was created. You can use this value to see how many VLFs were added each time the log grew. VLFs with the same CreateLSN value were created at the same time.

    The whole point of describing all these columns was just to provide a lead-in to my little holiday gift to you. Here is a table that can be created in the master database, and then populated with the output of DBCC LOGINFO. The name of the table starts with sp_, which means you can use it from any database.  Once you have captured all the rows from DBCC LOGINFO, you can query them using any TSQL queries.

    Here the table creation script:

    USE master
    GO
    IF EXISTS  (SELECT 1 FROM sys.tables
                WHERE name = 'sp_LOGINFO')
        DROP TABLE sp_loginfo;
    GO
    CREATE TABLE sp_LOGINFO
    (FileId tinyint,
    FileSize bigint,
    StartOffset bigint,
    FSeqNo int,
    Status tinyint,
    Parity tinyint,
    CreateLSN numeric(25,0) );
    GO

    And here is how to populate it. I will use the AdventureWorks database for the example:

    USE AdventureWorks;
    GO

    TRUNCATE TABLE sp_LOGINFO;
    INSERT INTO sp_LOGINFO
       EXEC ('DBCC LOGINFO');

    You can select the VLFs from most recently used to the least:


    SELECT * FROM sp_LOGINFO
    ORDER BY FSeqNo DESC;

    You can count how many VLFs have each status value:

    SELECT status, COUNT(*) FROM sp_LOGINFO
    GROUP BY status;

    If you don't have a lot of VLFs, you can try to generate more by running SELECT INTO repeatedly. Make sure you are in FULL recovery to get the most log growth:

    SELECT * INTO dbo.Orders FROM AdventureWorks.Sales.SalesOrderDetail;
    DROP TABLE dbo.Orders;
    GO 5


    TRUNCATE TABLE sp_LOGINFO;
    INSERT INTO sp_LOGINFO
       EXEC ('DBCC LOGINFO');


    SELECT status, COUNT(*) FROM sp_LOGINFO
    GROUP BY status WITH ROLLUP;

    You may be able to figure out other queries that provide useful or interesting information for you. I'll refer back to this table in my future discussions on DBCC LOGINFO and VLFs.

    Enjoy!

    ~Kalen


  • T-SQL Tuesday #001: My Datetime FAQ

    I am rising to Adam's challenge for his very first T-SQL Tuesday, and blogging about datetime issues. Since I just finished a 9-hour teaching day in Stockholm, and I'm incredibly jetlagged, I decided to keep this simple. I'm very much looking forward to what everyone else is going to say!

    For as long as I've been providing TSQL support, starting over 22 years ago while working for Sybase,  there have been confusions about datetime data storage and retrieval. I have seen the same questions over and over for all of those years, so I am listing the all-time most common datetime questions here. My FAQ stands for "Frequently ANSWERED Questions"! To keep the post simple, I am not going into complete detail for every answer. If you want more details you can read one of my books.  I am also just listing the tip of the iceberg in terms of common datetime questions.

    1. I have my dates stored like this: "March 4, 1948" but SQL Server is returning them like this: "1948-03-04 13:17:32.120".  How can I get the dates returned in the manner I stored them?

    This question actually can't be answered as written, because it is NOT true that a date could be stored as "March 4, 1948".  A character string could be stored that way, but not a date. Internally, datetime data is stored in a format you never see displayed, no matter how you enter the datetime value. It is stored as two integers, the first being the number of days before or after the base date of "January 1 1900" and the second integer being the number of milliseconds after midnight.

    You can actually see these two parts if you convert a datetime value to a binary string of 8 hexadecimal bytes. The first 4 hexadecimal bytes are the number of days before or after the base date, and the second 4 bytes are the number of clock ticks after midnight. You can then convert these 4-byte hexadecimal strings to integers.

    You can see the component parts of the current date and time by using the system function getdate(). I first store the current date and time in a local variable so I can be sure I'm using the same value for both computations.

    DECLARE @today datetime
    SELECT @today = getdate()

    SELECT @today
    SELECT CONVERT (varbinary(8), @today)
    SELECT CONVERT (int, SUBSTRING (CONVERT (varbinary(8), @today), 1, 4))
    SELECT CONVERT (int, SUBSTRING (CONVERT (varbinary(8), @today), 5, 4))

    The way a datetime value is displayed depends on several factors, including the tool you are using and your machine's regional settings. If you want to control the way a datetime value is displayed, you can convert it to a character string. 

    2. Why doesn't this query find any rows:  SELECT * FROM orders WHERE orderdate = 'March 4, 2009'

    The most like reason is because datetime data always includes a time portion as well as a date portion. Unless your orders were entered at a time of midnight (the default), they won't match the datetime value shown. (This answer frequently leads to the next question.)

    3. Why can't I store just a date or just a time, why do I always need both?

    The simple answer, prior to SQL Server 2008 is "just because". That is the way SQL Server works. Microsoft has been promising individual date and time datatypes for several versions now, and they finally showed up in SQL Server 2008, as well as a couple of other new datetime datatypes, plus the ability to specify a precision for fractional part a second.

    4.  Why doesn't this query show me how old I am? SELECT datediff(yy, @my_birthdate, getdate() )

    Arithmetic with datetime values works only with whatever datepart is specified as the first parameter and does no rounding. Since my first parameter is yy, for year, the datediff simply subtracts the year component of my birth year from the year component of today's date. If I haven't had my birthday yet, it will show me how old I will be on my birthday this year, which will not be my current age. And since birthday is almost at the end of the year, for most of the year I will get an incorrect number.

    5. Why can't I store dates earlier than 1753 using the datetime datatype?

    This was done for historical reasons, and started with the original Sybase specification for the datetime datatype. In what we sometimes refer to as the western world, we have had two calendars in modern time: the Julian and the Gregorian calendars. These calendars were a number of days apart (depending on which century you look at), so when a culture that used the Julian calendar moved to the Gregorian calendar, they dropped between 10 to 13 days from the calendar. Great Britain made this shift in 1752, and in that year, September 2nd was followed by September 14th. Sybase decided not to stored dates earlier than 1753 because the date arithmetic functions would be ambiguous. However, other countries made the change at other times, and in Turkey the calendar was not shifted until 1927.

    The new date and datetime2 times in SQL Server 2008 do allow us to store dates back to 0001-01-01.

    For one of the best articles on the web about datetime issues in SQL Server, see Tibor Karaszi's website:

    http://www.karaszi.com/SQLServer/info_datetime.asp

    I love the calendars in Europe, especially Germany and Sweden, because the first part of the word is my name!

    ~Kalen


  • Geek City: How Many Rows?

    In my last post, I told you I'd post something technical before I left for Sweden. I almost made it. I wrote this in the Seattle airport, but wasn't able to post it until I got to Stockholm.

    For as long as I've been working with SQL Server, I've been hearing/reading questions about how to quickly get a count of the number of rows in a table. We've always had the stored procedure sp_spaceused which would give us a count, but we've always be warned that the count was not guaranteed to be completely accurate. Way back in Sybase versions 3 and 4, that was an enormous understatement. Working in Tech Support, we recommended that people not rely on this value at all, and internally, we referred to the procedure as sp_space_useless.

    Things have improved over the years, and in many cases, the sp_spaceused procedure, or a direct SELECT from sysindexes could give a reasonably accurate count, but still, there has never been a guarantee. Starting in SQL Server 2005, sysindexes became only a compatibility view, and the recommended system object was the catalog view sys.partitions. Sys.partitions keeps track of rows per partition, so the following query would return multiple rows for a partitioned table:

    SELECT rows  FROM sys.partitions
    WHERE object_id = object_id('name of table')
    AND index_id < 2;

    If you want one value, rather than one per partition, you could simply ask for the SUM:

    SELECT SUM(rows)  FROM sys.partitions
    WHERE object_id = object_id('name of table')
    AND index_id < 2;

    But even in SQL Server 2008, it is not guaranteed that you can get an exact count of rows without actually counting. The only guaranteed way to get an accurate value is to SELECT from the table:

    SELECT count(*) FROM [name of table];

    But what does it mean to get an accurate count?

    And when would the count not be accurate when selecting from sys.partitions? Microsoft doesn't reveal all the cases in which you might not get an accurate value, all they will admit is that the count is not guaranteed. You can actually reproduce one case where you might not want to trust the count, by examining the count in one connection while another connection is changing the number of rows.

    Start by creating a copy of a table in the AdventureWorks2008 database:

    USE AdventureWorks2008;
    GO
    IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Header')
        DROP TABLE Header
    GO
    SELECT * INTO Header
    FROM Sales.SalesOrderHeader;
    GO

    You should have 31,465 rows in the new table.

    Next, get ready to start a transaction to add more rows to the table:

    -- Insert more rows:

    USE AdventureWorks2008;
    GO

    BEGIN TRAN
    INSERT INTO Header
    SELECT  [RevisionNumber]
          ,[OrderDate]
          ,[DueDate]
          ,[ShipDate]
          ,[Status]
          ,[OnlineOrderFlag]
          ,[SalesOrderNumber]
          ,[PurchaseOrderNumber]
          ,[AccountNumber]
          ,[CustomerID]
          ,[SalesPersonID]
          ,[TerritoryID]
          ,[BillToAddressID]
          ,[ShipToAddressID]
          ,[ShipMethodID]
          ,[CreditCardID]
          ,[CreditCardApprovalCode]
          ,[CurrencyRateID]
          ,[SubTotal]
          ,[TaxAmt]
          ,[Freight]
          ,[TotalDue]
          ,[Comment]
          ,[rowguid]
          ,[ModifiedDate]
      FROM [Sales].[SalesOrderHeader];
    GO 10

    ROLLBACK TRAN;
    GO

    In another window, you can run a query to inspect the count:

    USE AdventureWorks2008;
    GO
    SELECT rows  FROM sys.partitions
    WHERE object_id = object_id('Header');

    So now start the INSERT, and while it is executing, run the SELECT repeatedly.  You should see the count increasing as more rows are added, and then you can see it DECREASING as the ROLLBACK happens, and the values returned are not always an even multiple of 31,465.

    So which of these counts are accurate? 

    As we've been told, to get a true count, you'll need to use COUNT(*):

    SELECT count(*) FROM Header;

    However, this query will BLOCK when trying to get a value before the transaction is completely rolled back, at least if you're in the default READ COMMITTED isolation level. Selecting from sys.partitions will NOT block.

    If your transactions are short, you won't get long term blocking, but if you have lots and lots of transactions inserting new rows continuously, what would an 'accurate' count really mean? The value returned one second would be completely different from the value returned the next second.

    So you need to decide what 'accurate' really means, and maybe you'll decide that sys.partitions is accurate enough.

    Inaccurately yours,

    ~Kalen


  • Did You Know: You can Learn SQL Internals in Sweden!

    The class next week was full, but we just had two cancellations today. So now you have a chance to take my SQL Server Internals and Tuning class next week, at the wonderful Cornerstone Training Center, right outside of Stockholm.

    (I promise I'll have another geeky post this week, before I leave for Sweden on Friday. Plus, I'll try to contribute to Adam's Blog Party next Tuesday.)

    ~Kalen


  • Geek City: Intent to Update

    In the last couple of months, I've told you about Intent Locks and UPDATE locks. I want to just provide a wrap up to those posts that talks about both of these aspects of locking.

    You can think of both of these locking aspects as although SQL Server to indicate an intention. However, Intent Locks have to do with the unit of locking and UPDATE locks have to do with the type of lock. 

    Here's how you might think about the 'intention':

    If you get individual rows locks, you might have the intention of getting more row locks on the same page or table, so you might eventually want to lock the entire page or table. To make sure the page or table is available when you're ready to lock it, SQL Server acquires an Intent lock on the larger units. If you have a row lock, you'll get Intent locks on the page and the table containing that row. Intent locks can go with any type of lock: we can have Intent-Shared (IS), Intent-Exclusive (IX) and even Intent-Update (IU).

    UPDATE locks are a way of SQL Server stating your intention to change the type of lock. If you are searching for data to modify, you might intend to eventually get an X lock. So you get U locks while you're searching (instead of S locks)  and then X locks when you find the data to modify.

    You might also want to note that although you can get an IU lock on a page, at the table level an IU lock becomes IX. You can see that in the output I showed you in my UPDATE lock posting, which I repeat here:

    image

    Session 52 is requesting a U lock on a key, and it already holds an IU lock on the page that contains the key, but an IX lock on the object.

    I'm thinking that soon I'll write a post about interpreting some of the other information that shows up in the sys.dm_tran_locks view, for example the entity and the resource_description.  But not today…

    Have fun!

    ~Kalen


  • Geek City: UPDATE Locks

    UPDATE locks are not really a separate kind of lock, but rather are a hybrid of SHARED and EXCLUSIVE locks. And contrary to what you might think, UPDATE locks are not just acquired for UPDATE operations.  A transaction acquires this kind of lock when SQL Server executes a data modification operation but first needs to perform a search to find the resource to modify.

    While SQL Server is searching, it shouldn’t need to acquire an EXCLUSIVE lock; it only needs the EXCLUSIVE lock when the data to be changed is found. Normally, if a SQL Server process was just searching for data, it would acquire a SHARED lock on each resource it encounters and then determines whether it has found the data it is searching for. However, if SQL Server started out with a SHARED lock while searching for the data to modify, there are potential problems. A situation could occur where two processes were both searching for the same resource to modify (for example, the same customer row in the Customers table), using different access paths, and they could both reach the desired resource at the same time. If they both were acquiring SHARED locks on the data they were examining, they could both lock the resource they wanted to change, but before they made the modification they would have to convert their lock to an EXCLUSIVE lock. Since the other process would have a SHARED lock, no EXCLUSIVE lock could be granted. Each process would have a SHARED lock, and each would try to change it to an EXCLUSIVE lock, but neither could proceed because of the presence of the other. This is a deadlock situation, called a ‘conversion deadlock’. 

    UPDATE locks are really a deadlock avoidance mechanism. If SQL Server uses UPDATE locks, a deadlock will NOT occur. If a SQL Server process begins a search operation with the intention of eventually modifying data, it acquires UPDATE locks until it finds the data to modify. UPDATE locks are compatible with SHARED locks, but are not compatible with EXCLUSIVE locks or other UPDATE locks. So if two processes were searching for the same data resource, the first one to reach it would acquire an UPDATE lock, and then the second process could not get any lock and would wait for the first process to be done. Since the first process was not blocked, it could convert its UPDATE lock to an EXCLUSIVE lock, make the data modification, and finish its transaction and release its locks. Then the second process could make its change.

    In the sys.dm_tran_locks view, a request_mode value of ‘U’ indicates an UPDATE lock.

    So let's look at UPDATE locks:

    [I am using the old sample database pubs. If you want to try this exact code, you can download pubs from here.]

    -- Close all existing connections and start a new one

    -- Step 1:
    USE pubs;
    SET TRANSACTION ISOLATION LEVEL READ COMMITTED;


    BEGIN TRAN;
    UPDATE authors
    SET contract = 0
    WHERE au_lname = 'Ringer' ;

    ---- Step 2: Open a second connection window uncomment once,
    --  
    so the ROLLBACK is still commented. Execute…

    --USE pubs;
    --SET TRANSACTION ISOLATION LEVEL READ COMMITTED ;

    --BEGIN TRAN;
    --UPDATE  authors
    --SET city = 'Provo'
    --WHERE  state = 'UT';

    -- You should be blocked.

    ---- ROLLBACK TRAN;

    -- Step 3: Go back to the first connection window and run the following
    SELECT request_session_id AS session_id, DB_NAME(resource_database_id) AS [database],
       request_mode AS mode, resource_type as [type], 
       resource_associated_entity_id AS entity,
       resource_description,  request_status AS status
    FROM sys.dm_tran_locks;
    COMMIT TRAN;

    You should see output similar to this:

    image

    Note the U lock with the status of WAIT for a KEY, with the same resource description as a KEY lock that the first connection has  been GRANTed. Now COMMIT or ROLLBACK the first connection,  and you should see the second connection will get the X lock on the KEY it is waiting for, plus the X lock on the other KEY.

    image

    I mentioned that UPDATE locks indicated an 'intention of eventually modifying data', so you might think that UPDATE locks are similar to INTENT locks. Not really… UPDATE locks indicate an intention to change the lock mode, whereas INTENT locks indicate an intention to change the lock granularity.

    Wishing you maximum concurrency,

    ~Kalen


  • Did You Know? I'll be teaching a public course in Florida!

    The Boston class was so successful, both for me and for all the attendees, that I have been persuaded to do another public class on the East Coast.  This one is being arranged by Scott Klein, of the South Florida SQL Server Users' Group, and will be held March 22 - 26 in Fort Lauderdale.  You can find more information and registration details here.

    Even though I had a great time in Boston in October (the trees were turning glorious colors, but we did get snow one day). something just seems really appealing about Florida in March. The Boston class sold out, and we expect this one will also. I am hoping my husband can join me at the end of the week, and then we can take a few days to enjoy the area. I've never been to the Everglades, and after seeing True Lies 50 times (it seems that many, anyway)  in hotel rooms, I'd love to drive that bridge out to the Keys.

    ~Kalen


More Posts Next page »

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement