THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Michael Zilberstein

  • Partitioning by GUID

    Recently I had to use GUID (Global Unique Identifier) as primary key in order to ensure cross-server uniqueness. In many cases surrogate key (some sort of concatenation between server id and intra-server identity) would give a better performance but in my case due to application requirements surrogate key wasn't an option.

    Next step is partitioning by GUID column. Say, you want 3 partitions. Ho do you build partition function? Sure, you can perform a calculation and find the exact 2 values that will divide all the GUID range to 3 equal parts. I needed at least 20 partitions, so I used an approximation. If you write down long hexadecimal number, you'll receive something like A09F76... I decided that partitioning by first 2 tabs is enough. Example for decimal: let's divide values from 1 to 10000 to 3 partitions - what would be the ranges? The exact answer is: 1 - 3333, 3334 - 6666, 6667-10000. Using approximation by 2 first digits will give the following ranges: 1 - 3300, 3301 - 6600, 6601 - 10000. 2 orders of magnitude (first 2 digits) are good enough approximation. In hexadecimal first 2 digits (00 - FF) represent 0-255 in decimal. I'm not accustomed to hexadecimal calculations, so first I divide decimal range to 3 parts: 1 - 85, 86 - 170, 171 - 255. Next step is converting it to hexadecimal: 01 - 55, 56 - AA, AB - FF.

    OK, done with values, how do I translate them into GUID? For example, 01 in hexadecimal - what is the GUID representation of it (taking into account that those should be first 2 digits - not the last ones). Took me awhile to understand. First I tried "01000000-0000-0000-0000-000000000000" - surprisingly all the rows (more then a million) came to the first partition. OK, another try: "00000001-0000-0000-0000-000000000000". Still the same. The catch is that GUID should be read in a byte groups order from right to left while inside every byte group reading order is from left to right. And every 2 tabs are 1 byte. So the right representation for 01 as first 2 digits in GUID is: "00000000-0000-0000-0000-010000000000".

    To finish with, partition function that partitions GUID to 20 range groups:

    CREATE PARTITION FUNCTION [pf_GUIDRangePartition](uniqueidentifier) AS RANGE RIGHT 
    FOR VALUES 
    (
      
    N'00000000-0000-0000-0000-0D0000000000',
      
    N'00000000-0000-0000-0000-1A0000000000',
      
    N'00000000-0000-0000-0000-270000000000',
      
    N'00000000-0000-0000-0000-340000000000',
      
    N'00000000-0000-0000-0000-410000000000',
      
    N'00000000-0000-0000-0000-4E0000000000',
      
    N'00000000-0000-0000-0000-5B0000000000',
      
    N'00000000-0000-0000-0000-680000000000',
      
    N'00000000-0000-0000-0000-750000000000',
      
    N'00000000-0000-0000-0000-820000000000',
      
    N'00000000-0000-0000-0000-8F0000000000',
      
    N'00000000-0000-0000-0000-9C0000000000',
      
    N'00000000-0000-0000-0000-A90000000000',
      
    N'00000000-0000-0000-0000-B60000000000',
      
    N'00000000-0000-0000-0000-C30000000000',
      
    N'00000000-0000-0000-0000-D00000000000',
      
    N'00000000-0000-0000-0000-DD0000000000',
      
    N'00000000-0000-0000-0000-EA0000000000',
      
    N'00000000-0000-0000-0000-F70000000000'
    ) ;
  • Scalability features I would like to have in SQL Server

    1. Active-Active cluster.

    Like RAC in Oracle. Maximum that commodity server can provide by now is 4 quad-core CPU machine. If application requires more, we can split the database (which is not always possible; and even when it is - such a solution often looks very artificial) or buy high-end server with more CPUs and much higher price line. The only thing that keeps SQL Server alive in the market for applications with high scalability requirements is totally unreasonable (IMHO) price that Oracle charges for RAC. It costs so much that even high-end machine is cheaper. For 4 quad-core CPU machine you'll pay 100K$ for SQL Server 2005 Enterprise Edition license (25K$ per license per CPU) and 240K$ for Oracle 11g Enterprise Edition (40K$ per license per every 2 cores). If you 2 machines - for Microsoft the price is doubled. For Oracle - if you want them in RAC - double the price for licensing plus add 20K$ addition for RAC per every pair of cores. Which gives us 200K$ vs 960K$ for two identical 4 quad-core CPU machines. Price is the only real answer Microsoft actually has to RAC - all the rest is pretty pathetic. Judge by yourself - here is the link to the whitepaper (PDF) on Microsoft site that compares SQL Server with RAC. And what will happen if tomorrow Oracle reduces price for RAC by 50%? BTW, the prices for Oracle are list prices but they're willing to negotiate. So the gap isn't that big.

    2. Indexes per partition

    SQL Server 2005 gave partial answer (or workaround - to be honest) - indexed views. But with huge limitation - since index on view couldn't be aligned with partition schema, sliding window scenario was impossible (index on view had to be dropped in order to perform partition switch or merge). In SQL Server 2008 the situation is better - now indexed view can be aligned with partition schema and support switch and merge operations on partitions. Nevertheless, indexed view is pretty costly when compared to the normal index. It also requires unique index which is not always possible.

    Another part of the answer in SQL Server 2008 are filtered indexes. They should be cheaper then indexed view and can be non-unique. But filtered indexes can't be created on computed columns (Microsoft's answer to Oracle's function-based indexes) or UDT columns. Close but still not there.

    3. Bitmap indexes and function-based indexes

    Those aren't pure scalability features but they can be very useful when ability to analyze, apply filters and run queries against large read-only partitions is required (this requirement is one of the two main reasons to keep huge amounts of historical data - second reason is legislation and accountability requirements).

    I'm sure, there're other features I forgot to mention but lack of the features in the above list causes me to curse more often then I would like to.


  • How do we know that Microsoft conducts performance tests?

    We know it from comments in system procedure's code. Here is what I've found while digging for some undocumented stuff about statistics:

    -- I don't like this copy&paste stuff, but here it boosts performance by 50%

    You can find it too - in sys.sp_table_statistics2_rowset procedure. It returns first 2 parts of DBCC SHOW_STATISTICS result (stat header and density vector) in table format. The procedure header is:

    EXEC sys.sp_table_statistics2_rowset
        @table_name = ?, --  sysname
        @table_schema = ?, --  sysname
        @table_catalog = ?, --  sysname
        @stat_name = ?, --  sysname
        @stat_schema = ?, --  sysname
        @stat_catalog = ? --  sysname

    If you supply only @table_name and @stat_name, you'll receive the same output as from undocumented (I think) syntax of DBCC SHOW_STATISTICS:

    DBCC SHOW_STATISTICS(@table_name, @stat_name) WITH STAT_HEADER JOIN DENSITY_VECTOR


  • How to find real database creation date

    Restoring database from backup or moving it to another server resets all the creation_date fields (in sys.databases, msdb..backupset etc'). So is there any way to find out the original creation date? Looks like there is one. It is stored in the boot page of the database (page 9) and you'll have to use DBCC PAGE in order to find it.

    DBCC TRACEON(3604)
    GO
    DBCC PAGE (YourDBName, 1, 9, 3)
    GO

    In the bottom of the page you'll find dbi_crdate field - it is the original creation date.

    Boot page fields are poorly documented (I'm being polite - actually, they aren't documented at all). And from the first glance looks like it contains interesting data. For example: dbi_firstSysIndexes, dbi_maxLogSpaceUsed. Here Paul Randal blogged about another column, dbi_dbccLastKnownGood, which is the last time DBCC CHECKDB ran without finding any corruptions in the database.


  • Have your database ever been backed up?

    Recently I've been asked by one of my clients to write a simple script that would find out whether particular database had ever been backed up (full backup). The reason for a requirement was semi-automatic application which would enable end users to manage backup activity including manually backup transaction log - not via Management Studio but via much more friendly UI designed for non-DBAs. As you know, for database running in Full Recovery Mode, transaction log can't be backed up before full backup is done at least for the first time. OK, I thought. Let's just check whether full backup exists for the database:

    IF EXISTS( SELECT 1 FROM msdb..backupset WHERE database_name = 'MyDB' AND [type] = 'D' ) ...

     Here comes the first catch. Imagine following scenario:

    1. Create database MyDB.

    2. Create full backup of 'MyDB'.

    3. Drop database 'MyDB' (without deleting backup history). 

    4. Create new database with the name 'MyDB' or restore from backup.

    What do we have now? Query will return true while actually backup we have belongs to another non-existing database. Try to backup transaction log of the new DB - you'll receive error.

    First try failed - let's give another one. Both sys.databases and msdb..backupset contain creation date of the database. In sys.databases column name is create_date, in msdb..backupset it is database_creation_date. So let's just compare database_creation_date from the latest full backup row with database's create_date. If they're equal - we have full backup. Otherwise backup belongs to the previous incarnation of our database while full backup of currently existing db had never been taken. Sure, it should also work for the normal scenario - without deleted and restored databases on the way. So first let's check whether our theory works after second step of the scenario described above. But what happened? How comes, our database creation times are different? 

    Here comes second catch. For some strange reason, database_creation_date column in msdb..backupset doesn't contain milliseconds! Minutes, seconds - everything is there but for milliseconds its always 000. While create_date in sys.databases contains full date including milliseconds. Don't know who and for what reason cut milliseconds from the backup history table, but that's the way it works now.

    To make long story short - it was the last barrier. After removing milliseconds part from database creation date in sys.databases, everything began to work. Here is the full script that will tell you whether some particular database had ever been backed up (full backup).

    DECLARE @DBName SYSNAME
    SET 
    @DBName 'YourDatabase'

    DECLARE @LatestBackupDate DATETIME@BackedUpDBCreationDate DATETIME
    DECLARE @DBCreationDateNoMilliseconds DATETIME

    -- In msdb..backupset table, database_creation_date column is in datatime format
    -- but without milliseconds, so we remove milliseconds from database creation date.
    SELECT @DBCreationDateNoMilliseconds =
        DATEADD(millisecond, -DATEPART(millisecondcreate_date), create_date)

    FROM sys.databases
    WHERE [name] @DBName
    IF @DBCreationDateNoMilliseconds IS NULL
      
    BEGIN
           PRINT 
    'Database ' @DBName ' doesn' CHAR(39) + 't exist on the server'
          
    RETURN
       END

    SELECT
      
    @LatestBackupDate backup_finish_date,
      
    @BackedUpDBCreationDate database_creation_date
    FROM msdb..backupset
    WHERE 
      
    backup_set_id (
          
    SELECT MAX(backup_set_id)
          
    FROM msdb..backupset
          
    WHERE database_name @DBName AND
          
    [type] 'D' )

    IF @LatestBackupDate IS NULL
      
    BEGIN
           PRINT 
    'Full backup had never been taken for database ' @DBName
          
    RETURN
       END

    IF 
    @BackedUpDBCreationDate @DBCreationDateNoMilliseconds
      
    PRINT 'Full backup exists for database ' @DBName '; Transaction log backups can be taken'
    ELSE
       PRINT 
    'Full backup exists for the current version of ' @DBName ' database  had never been taken. Existing backup refers to non-existing database with the same name'




  • 1001st way to find SQL Server service start time

    There're a lot of ways to discover when SQL Server service had been last restarted. That's not a pure curiosity issue since almost all the DMVs and performance counters reset at service restart. So in order to calculate averages (IOs per minute, Waits per minute) etc, one needs to know the starting point. There're a lot of ways to find it. One is to take a look in Error Log (if log hadn't been deleted by numerious DBCC ERRORLOG commands). Another way is to look at sys.dm_exec_requests for start_time or in master..sysprocesses for login_time of system processes like LAZY WRITER. Recently I've found one more way: look in sys.databases for create_date of tempdb database. Since tempdb is re-created from scratch every time SQL Server restarts, it works. BOL reference about create_date column also says:
    -- Date the database was created or renamed. For tempdb, this value changes every time the server restarts.

    P.S. Results you receive trying different ways can vary - half minute here or there - because service restart doesn't happen in a second. But when you need to measure time since startup, 30 seconds usually won't be serious factor.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement