THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Linchi Shea

  • Finding Disk Partition Offsets Made Easy

    There were discussions on disk misalignment on this site. See my previous post on “Performance Impact of Disk Misalignment”, and Kevin Kline’s blog on “How to Improve Application and Database Performance up to 40% in One Easy Step

     

    But thanks to Jimmy May's PASS 2008 presentation on the I/O performance impact of disk partition misalignment, I have been hearing a lot about this topic yet again. Even though I thought the issue has long since been settled, it turns out that there is a huge room for improvement in practice.

     

    Now, in practice, if you need to check whether your disk drives are properly aligned, it can be a labor intensive effort if you have to resort to diskpar.exe or diskpart.exe to get the offset of each disk drive in your environment unless you only have a few servers. If you do have a large number of servers, and want to scan the disk drives on your servers for their partition alignment, you may find the attached utility useful.

     

    In the attached zip file, you’ll find an executable—find_partition_offset.exe—that takes a server name on the command line and gives you the offsets of the partitions on a given server, and a Perl script—find_partition_offset.pl—that is a wrapper around find_partition_offset.exe. The Perl script takes the name of a text file that has a list of server names, one on each line, and loops through these servers to report the partition offsets. You can execute the Perl script as follows:

     

    cmd>perl find_partition_offset.pl servers.txt

     

    For your convenience, I’ve included the C# source code for find_partition_offset.exe.


  • Where is HKLM\SOFTWARE\Microsoft\MSSQLServer\MSSQLServer?

    For a SQL Server 2000 default instance, the registry keys and values under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\MSSQLServer are absolutely critical to its functioning. So if someone tells you that the whole registry subtree underneath there does not exist on his box, but yet the SQL Server 2000 default instance works just fine on that box, what would be your response?

    One thing you may do is to execute the following T-SQL command to check if the keys are indeed missing:

    EXEC master.dbo.xp_regread 'HKEY_LOCAL_MACHINE','SOFTWARE\Microsoft\MSSQLServer\MSSQLServer\CurrentVersion', 'CurrentVersion'

    Now, you may just get the current version number returned from this statement. To further verify that the keys are there, you may launch Regedit.exe from your workstation and connect to the registry on the box. You then navigate to the MSSQLServer key, and not surprisingly, you see all the SQL Server related registry keys and values right where they are supposed to be.

    Case closed, right? Well, the person still claims that he can't see the registry keys, and even send you a Regedit screenshot to prove it. So you follow the exact steps of the person to reproduce the behavior. And surprisingly, you don’t see the registry keys, even though you are a local administrator.

    What's going on?

    Well, this is the scenario of running a 32-bit SQL Server 2000 default instance on Windows Server x64 edition. If you use a 64-bit tool (such as a 64-bit regedit.exe on the 64-bit Windows), you won’t see the keys because they are not stored there. In fact, the 32-bit SQL Server registry keys and values are actually stored under HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\MSSQLServer.

    However, Windows provides a transparent registry ‘redirection’ for 32-bit tools. So, if you happen to use a 32-bit tool—such as regedit.exe from your 32-bit workstation and xp_regread from a 32-bit SQL Server 2000 Instance—to view the registry, you would see the keys underneath HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\MSSQLServer just where you expect them to be.


  • Performance Impact: Mismatched Network Duplex Setting can Sink Your OLTP Throughput

    This is probably known to a lot of people. But I think it’s worth repeating.

     

    I was running some OLTP tests against a pretty powerful server. And everything appeared to be right on the client driver side and on the server side. But I was getting rather miserable throughput numbers. For this class of servers and for this type of OLTP workloads, I’d expect the transaction throughput per second (tps) to be well over 5000 tps. Instead, I was getting about 800 tps.

     

    No matter how hard I tried to drive the workload against the server, the throughput just refused to budge. And the server resource consumption didn’t change much either as I increased the number of users going against the server.

     

    In addition, for a given load level, when the average response time was expected to be in the 3ms range, the actual average response time was about 4~5ms. Not exactly what was expected, but not five or six times worse off either. However, upon a closer examination of individual transaction response times, I found that sometimes a transaction would take longer than 200ms to complete, and occasionally a transaction might even take well over 1000ms to complete. These long-running transactions (relatively speaking anyway) were what killed the overall transaction throughput.

     

    In the end, it was discovered that the root of the problem was a mismatched duplex setting on the server. While the network card on the client driver machine was set to full duplex, the network card on the server was set to half duplex. After a simple change of the duplex setting on the server to full duplex, the transaction throughput for the exactly same workloads went up from 800 tps to more than 5000 tps.

     

    Now in the real world, you may not see such a dramatic impact because you may not be consistently running small transactions at an extremely high frequency. For instance, if you are running large reporting queries, you may not feel any difference at all.

     


  • Large Query Performance and Query Parallelism

    [2008/08/25] This post has been modified significantly to correct some inaccurate statements because I mis-read Joe Chang's post.

    Joe Chang posted some interesting results using the TPC-H queries with the scale factor set to 10. I happened to have done something similar, and naturally noticed a rather significant difference between his results and mine. [2008/08/25: Okay the difference is not as significant as I had thought.]

    My results were obtained on SQL Server 2005 Enterprise x64 Edition running on Windows Server 2003 Enterprise x64 Edition. The test server, whose make and model will remain undisclosed,  had four 2.93GHz quad-core Intel Tigerton sockets (Xeon X7350) with 64GB of RAM with 20GB allocated to the SQL Server buffer pool. The TPC-H scale factor was set to 10, and the data were generated with DBGEN from tpc.org.

    My results are as follows:

    Query    MAXDOP 1    MAXDOP 4   MAXDOP 8     MAXDOP 16
    tpcH Query1 64845 17080 9373 4865
    tpcH Query2 571 180 110 97
    tpcH Query3 17687 3272 1792 1334
    tpcH Query4 23079 3859 2293 1633
    tpcH Query5 34255 5040 2646 1935
    tpcH Query6 2250 595 332 204
    tpcH Query7 18299 3562 1754 1513
    tpcH Query8 9636 3245 2042 1809
    tpcH Query9 50470 14094 7559 4460
    tpcH Query10 19363 4432 2289 1630
    tpcH Query11 4558 674 424 390
    tpcH Query12 10522 3815 2718 982
    tpcH Query13 26823 7577 4425 2121
    tpcH Query14 2356 624 357 227
    tpcH Query15 1705 1421 280 170
    tpcH Query16 7960 3461 2155 1681
    tpcH Query17 1070 325 220 276
    tpcH Query18 68366 19719 10174 5829
    tpcH Query19 2330 711 441 843
    tpcH Query20 968 314 221 695
    tpcH Query21 80858 14565 8149 6072
    tpcH Query22 9418 1762 986 664
             
    Total (second) 457 110 61 39

    All the numbers in the above table are query elapsed times in milliseconds except the very last line highlighted in yellow, which are in seconds. The last line contains the total elapsed time for Query1 through Query22 for each MAXDOP setting. All the results were obtained with all the database pages cached in the buffer pool. The same TPC-H query stream (i.e. Query 1 through Query 22) was repeated for 10 times with all the results from the first run thrown out, and the max elapsed time and the min elapsed time for each query for each of the subsequent runs were also thrown out before the elapsed times were averaged.

    So what's the difference between Joe's results and mine?

    Although there are some differences in the elapsed times in terms of the absolute numbers, the differences are minor enough to be no cause for alarm, especially when you take into consideration the platform differences and probably some differences in the test setup.

    However, the deline in the elapsed times in my test results is more significant than it is in Joe's results when MAXDOP is changed from 1 through 16. The difference is not as dramatic as I had throught, and initially commented on in the original version of this post (because I was reading the wrong line for whatever reason).

    Given that TPC-H has been around for a long time and all DBMS vendors have tried very hard to optimize their products to performance well on TPC-H, I do not expect SQL Server to be an exception, and am rather pleased to see that it scales very well with MAXDOP on these queries.

    I'm a bit embarrassed that I didn't read Joe's results correctly first time around, but felt that it's time well spent to be able to cross check independent tests results from realistic envrionments.


  • MSIZAP is Your Friend

    If you ask for my opinion on Windows Installer, I’d have to think what polite words to use. But almost every Microsoft product uses it, and that includes SQL Server. So we are stuck with it and have to live with it for now and for the forseeable future.

     

    One of major problems with Windows Installer is that its install packages can be so complex that it sometimes confuses the heck out of the Installer. I don’t know about you, but I have run into enough install/patch problems, and wasted countless number of hours when a SQL Server edition, version, or hotfix simply refuses to install and spits out strange error messages that turn out to have nothing to do with the real problems. Trying to fix these installer problems is like performing brain surgery on yourself—the chance of success is just not very high, at least judging from my own experience and even with help from the Microsoft support engineers.

     

    Now, there are times when you can afford to obliterate an installed product, or whatever remains in its mangled state, and those are times you may find msizap.exe extremely useful and can save you tons of grief.

     

    What msizap.exe can do for you is to wipe the slate clean by removing the file and registry settings that make up the Windows Installer configuration information for an installed product. Note that it removes the file and registry settings, not the files and registry keys themselves. If there are files of the product left in the file system, and registry keys/values left in the registry, you would have to delete them yourself.

     

    Now, msizap.exe can be a very dangerous tool to use because it basically pulls the carpet from underneath a product. And you have to be really careful not to apply it to a wrong product. In addition, msizap.exe is not exactly intuitive to use as you need to deal with the product codes, which are GUID strings.

     

    Fortunately, Microsoft has a more user friendly tool on top of msizap.exe. This more user friendly tool is called Windows Installer CleanUp Utility. You can find it easily via Google or in this KB article. When you launch this Installer CleanUp utility, it’ll automatically search the msi database, and list all the product names and their corresponding product codes for you to select. Then, you can simply select the product(s) whose msi configuration information you want to zap. And that’s all you need to do! Once the slate is wipted clean, the chance of the product install goes through becomes very high.

     

    The msizap.exe utility has saved me in several occasions, and I'm sure you’d be thankful that you have msizap.exe around should you run into a similar situation.


  • Performance Impact: Bookmark Lookup is Expensive - Even in Memory

    It’s well known that bookmark lookup (aka key lookup in case of a clustered index) is not cheap, especially when it comes to retrieving a lot of data. So I’m not going to rehash the pros and cons of bookmark lookup or why bookmark lookup is expensive. But I’ve noticed that when it comes to discussing bookmark lookup, all the literature seems to being focusing on their implications on storage I/Os. There is nothing wrong with that. In practice, it ultimately does come down to the fact that when reading a large amount of data, random I/Os are much more expensive than sequential I/Os, and bookmark lookup tends to incur random I/Os.

     

    What I’d like to highlight in this post is that even if all or most of the pages are in memory already, bookmark lookup is still very expensive compared to scan.

     

    To show this is the case, I piggybacked on the test setup described in my last post. Let me briefly recap the test setup. A single table was used, and its definition is as follws:

    CREATE TABLE test (
          i           int NOT NULL,
          j           int NOT NULL,   
          dt          datetime,
          filler      char(5000) NOT NULL
    )

    Two indexes were created on the table:

     

    CREATE CLUSTERED INDEX cix_test ON test(i);
    CREATE INDEX cx_test ON test(j);

    And the table was populated with 2,000,000 rows with the following INSERT statement in a loop (local variable @i going from 1 to 2,000,000):

    INSERT test(i, j, dt, filler)
    SELECT @i,
           CASE WEHN @i % 2 = 0 THEN @i ELSE 2000000 - @i END,
           getdate(),
           'abc'

    Then, the following two queries were separately run multiple times on a server with 32GB of RAM (25GB of which was allocated to the SQL2005 instance buffer pool):

     

    SELECT max(dt) FROM test;
    SELECT max(dt) FROM test WITH (index(cx_test));

     

    After the second run, all the pages were cached in memory (the table was about 16GB in size). Notice that with the first query SQL Server used clustered index scan to produce the result, whereas with the second query the index hint forced SQL Server to scan the nonclustered index and then use bookmark lookup to produce the result.

     

    I’ll look at the storage I/O implication of these two access paths in a separate post, but for now what do you think might be the difference in the elapsed time of these two query-processing methods when all the pages were cached in memory?

     

    The following chart shows the difference:

    For this test, the bookmark lookup method took almost three times as long as did the clustered index scan method.

     

    Why such a huge difference? When all the pages were cached in the buffer pool, we can’t explain the query elapsed time difference with the difference in storage I/O efficiency. We can, however, explain the difference with the difference in the number of pages SQL Server engine must visit in the buffer pool.

     

    The output from SET STATISTICS IO ON for each method is as follows:

     

    Clustered Index Scan:

    Scan count 1, logical reads 2008931, physical reads 0, read-ahead reads 0 …

     

    Bookmark Lookup:

    Scan count 1, logical reads 8130363, physical reads 0, read-ahead reads 0 …

     

    Clearly, the access method with bookmark lookup was not efficient, visiting four times as many pages as did the access method with clustered index scan.

     

    Also note that compared with the size of the table, the size of the nonclustered index was insignificant. There were about 5,000 pages in the leaf pages of the nonclustered index, whereas there were 2,000,000 leaf pages in the clustered index. So the cost of processing the query with the nonclustered index was dominated by the bookmark or key lookups.

     

    Of course, what I’ve discussed in this post is probably more academic than practical as you can’t expect all the pages being cached when processing large reporting queries in most real environments, and therefore the real dominate factor is not how many pages SQL Server needs to traverse in memory, but how many pages SQL Server has to bring into memory and how these pages are brought into memory.

     

    It's not my intention to leave you an impression that I'm picking on the bookmark lookup operation because it's just a bad method to retrieve data. That's not the case at all because for some queries, it is an excellent method. But that's not the focus of this post, and there are plenty of discussions on the advantages of the bookmark lookup operation you can find elsewhere.

     

    Test Environment

    SQL Server 2005 Enterprise x64 Edition ran on Windows Server 2003 Enterprise x64 Edition with Service Pack 2. The SQL2005 build was 9.0.3239 (i.e. SQL Server 2005 CU7) with 25GB allocated to the buffer pool. The server was a HP BL680 G5 including four 2.4GHz quad-core Xeon E7340 processors (aka Tigerton) with 2x4MB L2 cache and 32GB of RAM.

     


  • Performance Impact: Some Data Points on Read-Ahead

    In the next series of posts, I'll focus on SQL Server I/O, revisiting some common issues and taking a closer look at some others. In each post and as always, I'll make the case with specific data points from my tests. For the first two posts in this series, let me check out the read-ahead technique used by SQL server.

    Read-ahead is an important I/O optimization technique used by SQL Server.  Intuitively, if SQL Server can correctly forecast the need for more pages and read these pages ahead of time when they are needed for query processing in memory, your query is expected to perform better. Bob Dorr in his classic SQL Server 2000 I/O Basics whitepaper has an excellent description on how SQL Server read-ahead works. Every SQL Server professional with any interest at all in the storage engine should read that paper.

    But just how important is the read-ahead technique in query processing? Let's look at some results from an extremely simple test.

    First of all, a single table was used in the test, and here is the definition:

    CREATE TABLE test (
          i           int NOT NULL,
          j           int NOT NULL,   
          dt          datetime,
          filler      char(5000) NOT NULL
    )

    I then created a clustered index on the i column (for this test column j was not used and you can ignore it):

    CREATE CLUSTERED INDEX cix_test ON test(i);

    And the table was populated with 2,000,0000 rows with the following INSERT statement in a loop with @i going from 1 to 2,000,000:

    INSERT test(i, j, dt, filler)
    SELECT @i, CASE WEHN @i % 2 = 0 THEN @i ELSE 2000000 - @i END, getdate(), 'abc'

    With the statistics updated, I then ran the following SELECT query in two test scenarios:

    DBCC DROPCLEANBUFFERS;
    go
    SELECT max(dt) FROM test;

    The two test scenarios are as follows:

    Test Scenario Description
    Read-ahead enabled This is the default SQL Server behavior.
    Read-ahead disabled Read-ahead was disabled with trace flag 652: DBCC TRACEON(-1, 652)

    First, let's look at the elapsed time of the above SELECT query in the two scenarios (the numbers reported in the following table are averages over several test runs):

    Test Scenario Elapsed Time of SELECT (second)
    Read-ahead enabled 80
    Read-ahead disabled 210

    In both scenarios, the SELECT query was correctly processed with a clustered index scan. When read-ahead was disabled, the clustered index scan took almost three times as long as did the clustered index scan when read-ahead was not disabled. For this query, the performance difference was astounding.

    I don't have access to the source code that controls read-ahead so I can't tell exactly why and how it made such a huge difference from the code logic perspective. However, from Bob Dorr's description in the SQL Server 2000 I/O Basics whitepaper, it's rather clear that SQL Server read-ahead is quite aggressive in exploiting the performance capacity of the storage I/O subsystem.  So we can turn to observing the storage I/O behavior for explanation. The following table summarizes the values of the key I/O counters observed during the test runs:

    I/O Counter Value When Read-ahead is Enabled Value When Read-ahead is Disabled
    Bytes/Read > 350KB/read ~ 8KB/read
    Megabytes/sec 180 ~ 200MB/sec ~ 80MB/sec
    SQL Server Readahead ~20,000 readahead pages/sec 0

    Clearly, with read-ahead, SQL Server was able to take advantage of large sized I/Os (e.g. ~350KB per read). Large-sized I/Os are generally much more efficient than smaller-sized I/Os, especially when you actually need all the data read from the storage as was the case with the test query. From the table above, it's evident that the read throughput was significantly higher when read-ahead was enabled than it was when read-ahead was disabled. In other words, without read-ahead, SQL Server was not pushing the storage I/O subsystem hard enough, contributing to a significantly longer query elapsed time.

    That is, the table was about 16,000MB in size. At ~200MB/sec, it would take about 80 seconds to read 16,000MB, and at ~80MB/sec, it would take about 200 seconds to read the same amount of data. And these numbers (i.e. 80 seconds and 200 seconds) match nicely the recorded query elaped times which are reported in the second table above.

    In the next post, I'll check out the impact of disabling read-ahead on bookmark or key lookups when a nonclustered index is used.


  • SQL Server 2008 Data Compression: Impact of Data Distribution

    In this post, I continue to explore the implications of SQL Server 2008 data compression. It is particularly worth highlighting the fact that SQL Server 2008 data compression is performed at the page level instead of the table level. In other words, when SQL Server 2008 goes about compressing data, it does it one page at a time, and its compression algorithm considers only the data on that page.

    The page level approach to data compression has important implications, one of which is that what kind of compression ratio you get on a table depends on how the data in the table is distributed. That is, it is possible that with the exactly same set of data in the table, you can get radically different compression ratios if data is shuffled around in the table.

    To drive this point home, let me show you an extreme example with the following table:

    create table customer (

         cid   int identity,

         filler char(100)   NOT NULL,

    )

    With this table, I tested two scenarios:

    Scenario 1: Clustered index was created on column cid:

    create clustered index cus_cx on customer(cid);

    Scenario 2: Clustered index was created on column filler

    create clustered index cus_cx on customer(filler);

    Again, with the exactly same set of data in the table, in each scenario I compressed the data, and compared the table size before the compression and after the compression.  The data compression was done with ALTER TABLE statement as follows:

    ALTER TABLE customer REBUILD

                WITH (DATA_COMPRESSION = PAGE, MAXDOP=8);

    The following chart shows the table size before and after compression for each test scenario:

     

    Clearly, when the table was clustered on the column cid, the data compression ratio was very low (slightly greater than 1), whereas when the table was clustered on the column filler, the data compression ratio was very good at ~10 (the compressed table was about 10 times smaller).

    Why?

    First we need to know exactly what data was in the table, i.e. we need to know the characteristics of the data. Well, the data was populated with the following script:

    declare @i int

    set @i = 1

    begin tran

    while @i <= 8000000

    begin

          insert customer(filler)

          select left(cast(@i % 80 as varchar(8)) + '00000000', 7) + replicate('a', 92)

     

          if (@i % 10000) = 0

          begin

               commit tran

               begin tran

          end

          select @i = @i + 1

    end

    commit tran

    With this script, the first 80 rows would have the cid values going from 1 through 80, and the filler column would have 80 distinct values. With the cid values going from 81 through 160, the filler column would have another 80 distinct values, but as a set they were identical to the first 80 values. This pattern repeated every 80 rows because of the expression @i % 80 in the script.

    Note that the table row was sized such that each page would contain ~80 rows. This means that if there was a clustered index on column cid, every page would have 80 distinct values in the filler column. When SQL Server tried to compress such a page, there wasn’t much to compress. Now, when the clustered index was created on the filler column instead, every page was filled with duplicate values in the filler column, making the page extremely conducive to data compression.

    Again, this post is to help highlight the behavior of SQL Server 2008 data compression. The behavior presented here is neither good nor bad. It’s just a natural consequence of the page-level algorithm. Oracle’s data compression also adopts a page/block level algorithm, and therefore would see the same behavior.

    Note that if all the values are distinct, no matter how you shuffle the data you won't improve its compression ratio.


  • Dedicated Admin Connection

    Dedicated Admin Connection (DAC) is one of the highly touted features introduced in SQL Server 2005. Now that it has been around for a few years, I wonder how it is actually being used or whether it's being used at all in the real