THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

  • Update to Rules-Driven Maintenance

    Back in August I posted a first version of a rules-driven solution for backups, index and statistics maintenance and integrity checks. The system in general has been working well, and has saved my team a huge amount of time and effort. We are coming to the anniversary of its use in production soon.

    Today I offer an update that contains a few enhancements, performance improvements and a bug fix.

    To recap, this is a system or framework to manage many small databases across many instances on many servers in a policy-based/automated way. The details about how to deploy and how to use the system are in the original post, but at a high level:

    1. First deploy the code across a collection of servers using the included PowerShell script DeployMaintenance.ps1. That will create an administrative database on every instance (or use one that you provide) and populate it with the required tables, procedures, etc. The deployment script will also make a handful of SQL Agent jobs that handle all the maintenance on all the databases.
    2. Visit the instances. If there is maintenance already in place, disable or remove it so as not to perform this work twice. If necessary, change or override the default preferences for this solution in the admin database for each type of maintenance. This includes the time of maintenance windows, whether to use features like differential backups or Litespeed, whether there are databases that should be treated as exceptions, with different settings, etc.
    3. Enable the new policy jobs in SQL Agent.
    4. Monitor to make sure things are running smoothly.

    The theory behind this project is to save work configuring servers (efficiency), to ensure that the maintenance is in fact deployed and working for ALL databases in the environment (quality), to ensure that it works in a truly consistent way everywhere (consistency), and to make it so that the servers don’t have to be touched by a DBA when changes happen, such as adding or dropping databases (resiliency).

    Changes

    The updates in this 1.1 version include

    • Better reporting of errors. The jobs now write log files to the default error log location for SQL Server.
    • Better recovery from errors. The jobs in some cases continue on error – for example, one failed backup will not stop the backup job and leave other databases without backups.
    • Performance enhancements for examining backup history and system counters (mainly % log used).
    • One bug fix for the log backup logic on a server that uses both log shipping and Litespeed.

    Disclaimers

    The code for the system is posted here, and you are welcome to download it and try it out. Please note that this is not a commercial solution, and that while you may test or deploy this, you do so at your sole risk. Edits to the code almost certainly will be required for your environment. Please read, understand and test the code thoroughly before you even imagine using this in production. I do not want to cause you or your employer any pain. License for using this solution is GPL 3.0: http://opensource.org/licenses/GPL-3.0. You may use and modify this code, but not sell it.

    This has been run on SQL Server versions from 2005 to 2008 R2. It probably works on 2012, but I have not tested it thoroughly.

    I run only case-insensitive collation; if you are on a case-sensitive system, then you have my sympathy, and you might have to edit a few things to make this work.

    There is coverage for Quest Litespeed backups in the solution; other third-party backup products could be incorporated easily, but have not been. It should be possible to follow the pattern used to handle Litespeed and extend it to any other third-party stored procedures.

    If you haven’t done index maintenance for a while, and you flip this solution on you might blow up your transaction log. Be sure that you have plenty of log space and plenty of log backup space, and watch the process the first few runs. This is especially true if the combination of Standard Edition of SQL Server and the preference for Online index maintenance applies in your environment, because the maintenance process will perform index maintenance using reorganize (defrag), which produces a lot of log.

    Lastly, this is a solution for the type of environment that has many small databases, not the type with a few big, performance intensive databases. Databases that are large and require detailed attention may not be suitable for a one-size-fits-all solution like this.

    I've posted updates to this solution. Please get the latest version here.

  • Updated Warehouse Re-Index Script

    As I talked about in my last post, I just went through a re-indexing project that took the partitioned fact rows from our warehouse and relocated them into new files. There are a lot of tables and indexes involved, so I have a PowerShell “helper” script to generate the actual T-SQL that moves the data. The idea is to find all the indexes that use a particular partition scheme in the database, and make the CREATE INDEX statements that would recreate them on the new partition scheme. This script doesn’t do the re-indexing work, it just outputs the T-SQL that would do the work, so that the SQL can be verified and/or edited and run later, in the context of other modifications to the warehouse.

    (This is a much improved version of the work described in this even older post. It frankly wasn’t all that great back then.)

    The script was developed on PowerShell 2 and SQL Server 2012 client tools against SQL Server 2005, but I believe the work that the script does (the process of generating the T-SQL and the results) is compatible with any SQL Server EE 2005 or later. It assumes you’ve loaded the SQL provider or SQLPS module into PowerShell.

     <#
     PowerShell Script to generate CREATE INDEX statements from existing indexes
     mapped to a new partition scheme.
    
     14 Jan 2013 Merrill Aldrich
     Free for your use, but without warranty. Use at your sole risk.
     #>
    
     $ErrorActionPreference = 'Stop'
    
     # Create a connection to the SQL Server instance
     Set-Location SQLSERVER:\SQL\<yourserver>\DEFAULT\
    
     # Names of existing and new partition schemes. New will replace existing in script output:
     $oldPartitionScheme = 'nameOfOldScheme'
     $newPartitionScheme = 'nameOfNewScheme'
    
     # Compose a query that will list out the tables that use a specific partition scheme
     $partitionedTables = @( Get-ChildItem .\Databases\<yourdatabase>\tables |
         Where-Object -filter { $_.PartitionScheme -eq $oldPartitionScheme } )
    
     # For each partitioned table, script out create statements for partitioned indexes
     $scripter = New-Object ('Microsoft.SqlServer.Management.Smo.Scripter') ( get-item ( get-location ) )
    
     # Set scripting options here as needed
     # $scripter.Options.<someoption> = <somevalue>
     $scripter.Options.ScriptDrops = $False
    
     $partitionedTables | foreach {
         # Note current table in output:
         "/***  Table $($_.Schema).$($_.Name)   ***/"
    
         $partitionedIndexes = ( $_.indexes |
             Where-Object -Filter { $_.IsPartitioned -eq $True -and $_.PartitionScheme -eq $oldPartitionScheme }
          )
    
         # Script create statements
    
         $partitionedIndexes | foreach {
             $indexCreateScr = $scripter.Script( $_ )
    
             # Change the partition scheme for every statement to the new partition scheme
             $indexCreateScr = $IndexCreateScr -replace $oldPartitionScheme, $newPartitionScheme
    
             # Change ALTER TABLE ADD CONSTRAINT to CREATE INDEX statements for PKs
             $indexCreateScr = $indexCreateScr -replace `
                 'ALTER TABLE (\[[^\]]*\])\.(\[[^\]]*\]) ADD  CONSTRAINT (\[[^\]]*\]) PRIMARY KEY CLUSTERED', `
                 ('/* $& */' + "`n" + 'CREATE UNIQUE CLUSTERED INDEX $3 ON $1.$2')
    
             # For ALTER TABLEs the DROP_EXISTING index option has to be added to the script
             # Find any "WITH (" clause not containing the DROP_EXISTING option, and add it
             $indexCreateScr = $IndexCreateScr -replace 'WITH \((?!.*DROP_EXISTING)', 'WITH ( DROP_EXISTING = ON, '
    
             # Change index create options, if necessary. Edit to suit:
             $indexCreateScr = $IndexCreateScr -replace 'PAD_INDEX = OFF, ', ''
             $indexCreateScr = $IndexCreateScr -replace 'STATISTICS_NORECOMPUTE = OFF, ', ''
             $indexCreateScr = $IndexCreateScr -replace 'IGNORE_DUP_KEY = OFF, ', ''
             $indexCreateScr = $IndexCreateScr -replace 'SORT_IN_TEMPDB = OFF, ', ''
    
             $indexCreateScr = $IndexCreateScr -replace 'ONLINE = OFF', 'ONLINE = ON'
             $indexCreateScr = $IndexCreateScr -replace 'DROP_EXISTING = OFF', 'DROP_EXISTING = ON'
    
             # Insert some line terminators for legibility
             $indexCreateScr = $IndexCreateScr -replace "ON \[$($newPartitionScheme)\]", "`nON [$($newPartitionScheme)]"
             $indexCreateScr = $IndexCreateScr -replace ', ', ", `n`t"
    
             # Output the revised script
             $indexCreateScr
             "GO`n"
         }
     }

    Most of this is pretty basic – put the tables into a collection, loop over them, and for each table go through each partitioned index, scripting them out. The only parts that were tricky to develop involve the indexes that support primary keys.

    The scripter from SMO wants to script out a PK as “ALTER TABLE … ADD CONSTRAINT,” and the problem with that is you can’t use it to recreate the existing index using the DROP_EXISTING option. But, in fact, in SQL Server it is perfectly valid to do a CREATE INDEX … WITH ( DROP_EXISTING = ON ) against the index that supports a primary key. It’s just that the scripter isn’t designed to deal with that idea (as far as I know).

    I searched around for some solution to this issue to no avail, but instead fell back on good old hacky find and replace. There are a few semi-hairy regex expressions in the script that locate instances of ALTER TABLE … ADD CONSTRAINT and substitute the equivalent CREATE INDEX statement, and also locate the WITH clauses that don’t have the DROP_EXISTING option, and add that. The gibberish parts of those expressions are mostly looking for object names and escaping square brackets.

    If it’s not clear what this sort of thing means (\[[^\]]*\])\.(\[[^\]]*\]) drop me a comment and I’ll try to clarify. Conversely, if you are better at regex than I am, which is likely, any advice you might have to simplify this would be welcome!

    A side note about regex and PowerShell:

    If you are used to using regex in the find and replace dialog in SSMS / Visual Studio, it’s not the same flavor of regex that PowerShell uses, and there are some important differences:

    Tagging expressions/ Grouping is accomplished with parens (), not curlies {}

    Substitution is done with dollar signs $1 and not backslashes \1

    AND, epically confusing, the dollar signs in regex are not the dollar signs in PowerShell. So

    "$1" and '$1'
    in a –replace expression are different in important and confusing ways. One will look for a PowerShell variable $1 and try to expand it, while the other is the regex matching group #1 and does regex replacement.

    What makes this sad is that the regex one might use in editing T-SQL in SSMS can’t be moved over to PowerShell without a lot of changes.

  • Reindexing? Check your DOP.

    I had a long night last night of watching Perfmon counters while I coaxed our data warehouse fact data into new files. I learned something through this little project, perhaps dumb and obvious, but important: don’t assume that your re-indexing work, even the biggest flavor, is automatically I/O limited.

    Our systems are relatively small, but we do have a nice disk array. This server is just a 2 x 6-core machine with 64 GB of RAM and two HBAs, but it has a mix of SAN Flash and Raid 1+0 FC disks for the SQL Server files. The operation is one I have done a few times. Take the fact data, which is partitioned in to monthly files, and move it into new files, by mapping it into a new partition scheme. We do this about annually to optimize indexes and re-tier the storage.

    Basically, I take all the existing tables and indexes, one at a time, and do

    CREATE INDEX foo ON dbo.bar ( col1, col2, col3 )
    WITH ( DROP_EXISTING = ON, <other options> )
    ON aNewPartitionScheme ( col1 )

    to move the data into new, pre-allocated files. The data involved is about 1 TB.

    Eight Cores Seems like Enough? Wrong.

    I thought, wrongly, that if I did this work with 8 cores in play, that the machine would be running it about as fast as it was capable of going, figuring I was sure to be I/O limited. My reasoning was that “create with drop existing” is a really simple pass-through operation, just reading the existing index, which is already in order, and writing it into new pages. It parallelizes very well* because of the arrangement of the data. How could eight cores be kept busy enough with that work to saturate the I/O subsystem?

    Instead, the results surprised me: running with DOP = 8 I was getting, depending on the index, between 200 and 450 MB per second reads and writes to and from the disks. Changing the DOP to use all 12 cores on the machine, the throughput actually increased to 600 – 700 MB per second, making the maintenance significantly shorter. As it turned out, the work was CPU limited somehow, even with eight cores allowed. The eight cores would run at 100%, while the disk system didn’t.

    For this case I had the luxury of being the only one on the machine during the maintenance window, which meant I could set MAXDOP to any value I wanted without side effects. I set that to 12 for the duration of the window, to use all cores, and then set it back again.

    Your mileage may vary, but with this new world of storage I learned to test for optimal DOP when I have any large-scale index operations.

    * on Enterprise Edition only, it’s worth noting.

  • Diversion: Sub-Second SQL Sudoku Solver

    I enjoy Sudoku as a way to relax on occasion but, being an IT guy, I suppose I am predictably meta: solving them is fun, but making a machine to solve them is even more fun. I created a clunky solver once before based on the idea of using as few T-SQL statements as possible. This year I decided to try to make a fast solver instead. It’s working fairly well and solves most puzzles in under a second on my laptop. It helps I’ve been sick with a cold for the past two weeks, giving me the sort of “enforced relaxation” that drives me to puzzles.

    Here’s the basic algorithm:

    1. Create a table for the puzzle solution, and insert the given values from the puzzle definition.
    2. Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This is a bit like the “notes” feature of an electronic Sudoku app, where you can indicate possible values for unsolved cells.
    3. Loop while there is no solution
      1. Using some list of solving rules, “push” solved values from the cells in the candidates table to the solution table.
      2. Use a process of elimination to remove candidate values that the Sudoku rules make impossible.
    4. End Loop

    I started with that fundamental idea, but I found that the solver rules I could encode in T-SQL (at least with my level of math skill) covered most but not all puzzles. Some of the advanced ones would stall or reach an impasse where the available rules couldn’t make progress eliminating values. For those cases I added some branching logic to do a minimal level of trial and error. The adjusted algorithm became this more complicated version:

    1. Create a table for 1-n puzzle solutions with a branching identifier. Insert the given values from the puzzle as branch 1.
    2. Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This table also has a branch identifier, and the first set of candidates is also labeled as branch 1.
    3. Loop while there is no solution
      1. Using some list of solving rules, “push” solved values from any branches in the candidates table into the corresponding branch in the solution table
      2. If no cells are solved, then “branch” the solution by duplicating the solved values and the candidates into multiple versions/branches on some cell that has a minimal number of candidate values. Example: if we see no progress, and cell 3B might be a 2 or might be a 3, then make two versions, one using 2 and one using 3, and proceed with both.
      3. If a branch contains a “contradiction” – meaning an impossible combination of values – then it cannot be the solution to the puzzle, so remove it.
      4. Use a process of elimination to remove candidate values that the rules make impossible, from all branches.
    4. End Loop

    I found that getting started with this felt complicated, but I was able to distill the code down gradually to a few statements. Admittedly, they are dense, and take some time to understand, but in all it didn’t end up being very much code at all.

    Modeling the Puzzle

    First, how to represent the puzzle space in SQL Server? A Sudoku puzzle looks like a table, but in fact it isn’t like a database table at all, because position is so important to the logic of the puzzle – in SQL the order of rows and columns is, by definition, undefined. For example, there’s no “last row” in a table. Further, in creating the solver I tried to stay in the spirit of relations and sets, and not resort to techniques like string manipulation or stuffing multiple values into the same column.

    For me, it was simplest to think of the puzzle as several intersecting sets (“rows”, “columns,” “squares,” available digits) and make a table representing the sets instead of the physical layout of the Sudoku square.

    The schema looks like this: a Solution table has columns for Value, Row, Column, and Square. Value is the digit 1-9 in the cells of the puzzle, and Row, Column and Square represent the position of the cell in the larger Sudoku square. Rows are numbered, columns assigned letters A-I and the squares are labeled S1-S9, reading left to right and top to bottom in the physical puzzle. Unfortunately, Row, Column and Square are all keywords in T-SQL, so I abbreviated them as Rw, Cl, Sq to keep the square brackets at bay in the code. Lastly, a Branch column is appended to allow the solver to make versions of the puzzle:

    USE Sudoku
    GO
    
    CREATE TABLE dbo.Solution (
        Value tinyint NOT NULL,
        Rw tinyint NOT NULL,
        Cl char(1) NOT NULL,
        Sq  AS ( dbo.SquareFor(Rw,Cl) ),
        Branch int NOT NULL
    ) 

    Because the square is implied by the row and column location in the puzzle, the Sq value is computed using a UDF, as:

    CREATE FUNCTION dbo.SquareFor 
    (
        @Rw tinyint, @Cl char(1)
    )
    RETURNS char(2)
    WITH SCHEMABINDING
    AS
    BEGIN
        DECLARE @Result char(2)
    
        SELECT @Result = 
            CASE 
            WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'A' AND 'C' THEN 'S1'
            WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'D' AND 'F' THEN 'S2'
            WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'G' AND 'I' THEN 'S3'
            WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'A' AND 'C' THEN 'S4'
            WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'D' AND 'F' THEN 'S5'
            WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'G' AND 'I' THEN 'S6'
            WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'A' AND 'C' THEN 'S7'
            WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'D' AND 'F' THEN 'S8'
            WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'G' AND 'I' THEN 'S9'
            END
    
        RETURN @Result
    
    END

    [These code samples are just to explain the solution and won’t run in the order presented, but all the code in a runnable script is attached to this post.]

    The domains for allowed values, rows and columns are very useful to explicitly define, as we’ll see later, so I have four reference tables containing the possible digits, row, column and square labels, all following this pattern:

    CREATE TABLE dbo.Cls(
        Label char(1) NOT NULL,
        CONSTRAINT PK_Cls PRIMARY KEY CLUSTERED ( Label ASC )
    ) ;
    
    INSERT dbo.Cls ( Label ) VALUES ( 'A' );
    INSERT dbo.Cls ( Label ) VALUES ( 'B' );
    INSERT dbo.Cls ( Label ) VALUES ( 'C' );
    INSERT dbo.Cls ( Label ) VALUES ( 'D' );
    INSERT dbo.Cls ( Label ) VALUES ( 'E' );
    INSERT dbo.Cls ( Label ) VALUES ( 'F' );
    INSERT dbo.Cls ( Label ) VALUES ( 'G' );
    INSERT dbo.Cls ( Label ) VALUES ( 'H' );
    INSERT dbo.Cls ( Label ) VALUES ( 'I' );
    

    Given those objects, the rules of the game can be encoded with some foreign keys and unique constraints. Only one of each digit allowed per row, column and square, and only those values defined in the domain tables:

    ALTER TABLE dbo.Solution WITH CHECK ADD CONSTRAINT FK_Solution_Vals FOREIGN KEY ( Value )
    REFERENCES dbo.Digits ( Value );
    
    ALTER TABLE dbo.Solution  WITH CHECK ADD CONSTRAINT FK_Solution_Cols FOREIGN KEY( Cl )
    REFERENCES dbo.Cls ( Label );
    
    ALTER TABLE dbo.Solution  WITH CHECK ADD CONSTRAINT FK_Solution_Rows FOREIGN KEY( Rw )
    REFERENCES dbo.Rws ( Label );
    CREATE UNIQUE NONCLUSTERED INDEX UniqueClValue ON dbo.Solution 
    (
        Value ASC,
        Cl ASC,
        Branch ASC
    );
    
    CREATE UNIQUE NONCLUSTERED INDEX UniqueRowValue ON dbo.Solution 
    (
        Value ASC,
        Rw ASC,
        Branch ASC
    );
    
    CREATE UNIQUE NONCLUSTERED INDEX UniqueSqValue ON dbo.Solution 
    (
        Value ASC,
        Sq ASC,
        Branch ASC
    );
    

    Note the addition of Branch in the indexes, so that we can store multiple versions of a puzzle, while still being constrained to the puzzle rules.

    Now, how to populate the Solution table with a puzzle? In the code sample attached to this post, there are several example puzzles prebuilt, but this is the method. There’s a default on the Solutions table to make new inserts as Branch 1:

    ALTER TABLE dbo.Solution ADD CONSTRAINT DF_Solution_Branch  DEFAULT (1) FOR Branch

    That allows setting up a puzzle by entering just the values and row and column position:

    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 1, 'A' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 1, 'I' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 2, 'D' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 2, 'E' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 2, 'H' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 2, 'I' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 9, 3, 'B' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 3, 'D' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 3, 'I' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 4, 'C' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 4, 'F' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 4, 'G' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 5, 'B' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 5, 'C' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 5, 'G' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 5, 'H' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 6, 'C' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 1, 6, 'D' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 6, 'G' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 7, 'A' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 1, 7, 'F' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 7, 'H' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 8, 'A' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 8, 'B' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 9, 8, 'E' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 8, 'F' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 9, 'A' );
    INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 9, 'I' );
    

    It’s even faster to enter the values and positions using SSMS > Edit Top 200 (gasp!) on the Solutions table. I also found it helpful to back the puzzles up with SELECT INTO as I was working on the solver, because entering them is definitely tedious. In the code sample there are several example puzzles that can be loaded like this:

    TRUNCATE TABLE dbo.Solution;
    
    INSERT dbo.Solution ( Value, Rw, Cl ) 
    SELECT Value, Rw, Cl 
    FROM dbo.Sample5;
    

    One last bit of setup: even though a relational table doesn’t directly model the Sudoku square, requiring this other method to model the puzzle, it’s incredibly helpful to be able to view the puzzle like a Sudoku square. In fact, I found working on this was impossible without rendering the puzzle in the paper layout. The paper display can be presented using a pivot:

    CREATE VIEW dbo.DisplaySolution AS 
    SELECT Branch, Rw,
        MIN( CASE  Cl WHEN 'A' THEN Value ELSE NULL END ) AS A,
        MIN( CASE  Cl WHEN 'B' THEN Value ELSE NULL END ) AS B,
        MIN( CASE  Cl WHEN 'C' THEN Value ELSE NULL END ) AS C,
        MIN( CASE  Cl WHEN 'D' THEN Value ELSE NULL END ) AS D,
        MIN( CASE  Cl WHEN 'E' THEN Value ELSE NULL END ) AS E,
        MIN( CASE  Cl WHEN 'F' THEN Value ELSE NULL END ) AS F,
        MIN( CASE  Cl WHEN 'G' THEN Value ELSE NULL END ) AS G,
        MIN( CASE  Cl WHEN 'H' THEN Value ELSE NULL END ) AS H,
        MIN( CASE  Cl WHEN 'I' THEN Value ELSE NULL END ) AS I
    FROM dbo.Solution 
    GROUP BY Branch, Rw
    SELECT 
        A, B, C, D, E, F, G, H, I 
    FROM dbo.DisplaySolution 
    ORDER BY Rw;
    

    Now we have all the setup required to make the solver work. Returning to the original algorithm above, we have a few discrete tasks that we can break out into procedures:

    1. Deriving a table of all the possible candidate values for empty cells in the solution
    2. Searching that candidates table for solved cells and pushing those into the solution
    3. Eliminating candidates
    4. Branching the puzzle into versions if we get stuck in a situation where the rules we have in #2 and #3 don’t work

    Candidates

    So, deriving all possible candidate values is fairly straightforward: if we cross join the rows, the columns and the digits 1-9, we get a derived table of every possibility for any puzzle. From that we can eliminate the candidate values from the cells that are already in our specific puzzle. We can also eliminate all the candidates that conflict with the values in the given puzzle, in the same row, column or square.

    In order to make this manageable, I used a “bag” analogy. Imagine a bag full of tiles, like scrabble tiles, where there are 9 ones, 9 twos, 9 threes, etc. From that bag we know that 1 one will go in the first Sudoku square, 1 one into the second, one into the third, and so on. It’s possible to “label” the tiles ahead of time with the 9x9 square to which they will belong even though we don’t know which specific cell they will ultimately occupy.

    So this code starts with a CTE called “bag” that has just this structure, made using a cross join of all digits and all squares. From the bag, we discard a bunch of the tiles for the already-occupied cells in the solution. Next, we take the remaining contents of the bag and use it to make a Candidates table, further discarding tiles that conflict with the already-solved cells in the solution by occupying the same row, column or square. It’s a great CTE exercise:

    CREATE PROCEDURE dbo.MakeCandidates
    AS
    BEGIN
    
        -- Make a table containing all possible "candidate" values that could go in 
        -- empty cells in the puzzle
        
        IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Candidates]') AND type in (N'U'))
        DROP TABLE dbo.Candidates;
    
        WITH Bag AS (
                SELECT Digits.Value, Squares.Label AS Sq, 1 AS Branch
                FROM dbo.Digits cross join dbo.Squares 
                EXCEPT SELECT Value, Sq, Branch FROM dbo.Solution
            ),
            
            AllCells AS (
                SELECT Rws.Label AS Rw, 
                    Cls.Label AS Cl,
                    ( SELECT dbo.SquareFor( Rws.Label, Cls.Label ) ) AS Sq
                FROM Rws CROSS JOIN Cls
            ),
            
            Placements AS ( 
                SELECT Bag.Value, allCells.Rw, allCells.Cl, allCells.Sq 
                FROM AllCells JOIN Bag ON allCells.Sq = bag.Sq
            ) 
            
            SELECT p.Value, p.Rw, p.Cl, p.Sq, 1 AS Branch 
            INTO dbo.Candidates
            FROM placements p
            WHERE NOT EXISTS( SELECT 1 FROM dbo.Solution 
                WHERE Solution.Rw = p.Rw AND Solution.Cl = p.Cl
                )
            AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
                WHERE Solution.Value = p.Value AND Solution.Rw = p.Rw 
                )
            AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
                WHERE Solution.Value = p.Value AND Solution.Cl = p.Cl 
                )
            AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
                WHERE Solution.Value = p.Value AND Solution.Sq = p.Sq 
                );
    END
    

    The result is that we have a Solution table with one version of the puzzle at its starting point, and a Candidates table that contains all the plausible values that might go into all empty cells in the puzzle. All the rows in both tables are identified as “Branch 1.” Going forward we’ll work back and forth between the two tables to solve the puzzle by process of elimination.

    Solved Cells

    The next chunk of the algorithm is to try to search the candidates for solved cells and push those into the solution. This code uses about four of the common Sudoku solving methods (those that I could sanely encode in T-SQL). Many sites describe them, such as http://www.sudoku129.com/puzzles/tips_intro.php or http://www.sudokuoftheday.com/pages/techniques-overview.php or your favorite search engine’s suggestions.

    CREATE PROCEDURE dbo.PushSolvedCells ( @branch int, @count int OUTPUT )
    AS
    
    BEGIN
    
        -- Cells having exactly one possible value must be that value
        WITH solvedCells AS (
                SELECT Value, p.Rw, p.Cl, p.Branch
                FROM dbo.Candidates p
                INNER JOIN (
                    SELECT Rw, Cl, Branch
                    FROM dbo.Candidates 
                    GROUP BY Rw, Cl, Branch
                    HAVING COUNT(*) = 1
                ) AS onePossibleValue 
                    ON p.Rw = onePossibleValue.Rw 
                    AND p.Cl = onePossibleValue.Cl 
                    AND p.Branch = onePossibleValue.Branch
                    AND p.Branch = @branch
            ) ,
                
            -- Numbers that are possible in only one cell of a row must occupy that cell
            unqInRow AS (
                SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
                FROM dbo.Candidates p
                INNER JOIN (
                    SELECT Value, Rw, Branch
                    FROM dbo.Candidates 
                    GROUP BY Value, Rw, Branch
                    HAVING COUNT(*) = 1
                ) AS uniques 
                    ON p.Rw = uniques.Rw 
                    AND p.Value = uniques.Value
                    AND p.Branch = uniques.Branch
                    AND p.Branch = @branch
            ),
    
            -- Numbers that are possible in only one cell of a column must occupy that cell
            unqInCol AS (
                SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
                FROM dbo.Candidates p
                INNER JOIN (
                    SELECT Value, Cl, Branch
                    FROM dbo.Candidates 
                    GROUP BY Value, Cl, Branch
                    HAVING COUNT(*) = 1
                ) AS uniques
                    ON p.Cl = uniques.Cl 
                    AND p.Value = uniques.Value
                    AND p.Branch = uniques.Branch
                    AND p.Branch = @branch
            ),
    
            -- Numbers that are possible in only one cell of a square must occupy that cell
            unqInSquare AS (
                SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
                FROM dbo.Candidates p
                INNER JOIN (
                    SELECT Value, Sq, Branch
                    FROM dbo.Candidates 
                    GROUP BY Value, Sq, Branch
                    HAVING COUNT(*) = 1
                ) AS uniques 
                    ON p.Sq = uniques.Sq 
                    AND p.Value = uniques.Value
                    AND p.Branch = uniques.Branch
                    AND p.Branch = @branch
            ) 
            
            INSERT dbo.Solution( Value, Rw, Cl, Branch )
                SELECT Value, Rw, Cl, Branch FROM solvedCells
                UNION 
                SELECT Value, Rw, Cl, Branch FROM unqInRow
                UNION 
                SELECT Value, Rw, Cl, Branch FROM unqInCol
                UNION 
                SELECT Value, Rw, Cl, Branch FROM unqInSquare;
            
        SET @count = @@ROWCOUNT;
        
    END
    

    The procedure processes one branch at a time given by @branch, and outputs the number of cells that were solved with the rules in the code. The count will become important when we need to track “stalled” branches of the solution.

    Elimination

    The next chunk of the algorithm says, “OK, now that those cells’ values are known, what other candidates are eliminated.” This is sort of the other half of the solving logic – take the remaining candidates and try to eliminate as many as possible, with other solving techniques:

    CREATE PROCEDURE dbo.EliminateCandidates as
    BEGIN
    
        DELETE dbo.Candidates 
        WHERE EXISTS ( SELECT 1 FROM dbo.Solution s
            WHERE s.Cl = Candidates.Cl 
                AND s.Rw = Candidates.Rw 
                AND s.Branch = Candidates.Branch
            )
        OR EXISTS ( SELECT 1 FROM dbo.Solution s1
            WHERE s1.Value = Candidates.Value 
                AND s1.Rw = Candidates.Rw  
                AND s1.Branch = Candidates.Branch
            )
        OR EXISTS ( SELECT 1 FROM dbo.Solution s2
            WHERE s2.Value = Candidates.Value 
                AND s2.Cl = Candidates.Cl 
                AND s2.Branch = Candidates.Branch
            )
        OR EXISTS ( SELECT 1 FROM dbo.Solution s3
            WHERE s3.Value = Candidates.Value 
                AND s3.Sq = Candidates.Sq 
                AND s3.Branch = Candidates.Branch
            ); 
    
        -- Narrow rows by square: for each row, find values that can only be in one square
        -- and delete other candidates for those values in the same square
    
        WITH irs AS ( 
            SELECT DISTINCT Value, Rw, Sq, Branch
            FROM dbo.Candidates p
            WHERE ( SELECT COUNT ( DISTINCT Sq ) 
                    FROM dbo.Candidates p2 
                    WHERE p2.Rw = p.Rw 
                        AND p2.Value = p.Value 
                        AND p2.Branch = p.Branch ) = 1
            )
            DELETE dbo.Candidates 
            FROM dbo.Candidates p
            JOIN irs ON p.Sq = irs.Sq
                AND p.Value = irs.Value
                AND p.Branch = irs.Branch
                AND p.Rw != irs.Rw;
                
        -- Narrow columns by square: for each column, find values that can only be in one square
        -- and delete other candidates for those values in the same square
    
        WITH irs AS (        
            SELECT DISTINCT Value, Cl, Sq, Branch
            FROM dbo.Candidates p
            WHERE ( SELECT COUNT ( DISTINCT Sq ) 
                FROM dbo.Candidates p2 
                WHERE p2.Cl = p.Cl AND p2.Value = p.Value AND p2.Branch = p.Branch ) = 1
            )
            DELETE dbo.Candidates 
            FROM dbo.Candidates p
            JOIN irs ON p.Sq = irs.Sq
                AND p.Value = irs.Value
                AND p.Branch = irs.Branch
                AND p.Cl != irs.Cl;
    END
    

    Alternating these two procedures – locating solved cells and pushing them into the solution, and then eliminating other candidates – will solve most of the “easy” and some “intermediate” level puzzles in books or Sudoku apps. The next challenge is what to do when these rules are exhausted and can’t narrow the field of candidate values. A mathematician might be able to add more solving methods to the handful I have here and make the algorithm solve ANY puzzle. I’m not quite that smart, so for some advanced puzzles I had to introduce branching, or live with the fact that some would not be solved.

    (Technically I did a lot of work implementing one more advanced method, which was pages and pages of code – but I found that it was never invoked in any sample puzzles I tried, perhaps because the logic was already implied by one of the other statements above.)

    Branch

    Branching presents a few challenges: first, how to know when to branch. Second, some branches are by definition invalid, so how to we identify and cull those? Third, simply, what are the logistics to make branches? How?

    The answer I came up with to the first is just to count how many cells are solved in each pass through the solver. If we execute a pass and we don’t see any new values being inserted, then the branch(es) we have in play are obviously stalled. When they stall, make a new branch.

    Invalid branches are interesting. They fail in one of two ways: either there is a cell that has NO candidates, which ultimately is caught by the check above, or there is a contradiction that surfaces where, for example, a digit appears to belong in two places. The contradictions violate the rules of the puzzle, and because we encoded the rules of the puzzle in a series of unique constraints they become constraint violations in SQL Server. They actually cause the script to throw an error where the implied solved cells can’t be inserted into the solution because they violate a unique index. I decided just to roll with that idea: branches work until they throw an error, at which point they can be discarded from the process.

    So, when we run the PushSolvedCells procedure, and @count shows we didn’t solve any cells, then we branch. Branching works by identifying a cell in Candidates that has a minimal number of possible values (practically always 2). We then rank the possible values for that cell, insert the #1 possibility into the existing branch and use the #2 and higher in new copy(ies) of the solution and all the candidates:

    CREATE PROCEDURE dbo.Branch AS
    BEGIN
    
        IF OBJECT_ID( 'tempdb..#branchCells' ) IS NOT NULL DROP TABLE #branchCells;
    
        CREATE TABLE #branchCells ( Branch int, Value int, Rw int, Cl char(1) );
    
        WITH 
            BranchCell AS (
                -- Locate a cell with a small number of candidates 
                SELECT TOP ( 1 ) Rw, Cl, Branch 
                FROM dbo.Candidates
                GROUP BY Rw, Cl, Branch
                ORDER BY COUNT(*), Rw, Cl
            ) ,    
            LastBranch AS (
                SELECT MAX( branch ) AS id FROM dbo.Solution
            ) ,
            BranchVals as (
                -- Find the candidate values for that cell
                SELECT ( ROW_NUMBER() OVER ( order by c.value ) - 1 ) as branchOffset, 
                    c.Value, c.Rw, c.Cl, c.Branch 
                FROM dbo.Candidates c
                INNER JOIN BranchCell ON c.Rw = BranchCell.Rw
                    AND c.Cl = BranchCell.Cl AND c.Branch = BranchCell.Branch
            ) INSERT #branchCells ( Branch, Value, Rw, Cl )
                SELECT 
                    -- Number the candidates by branch 
                    CASE WHEN branchOffset = 0 THEN BranchVals.Branch 
                    ELSE branchOffset + LastBranch.id END AS Branch,
                    BranchVals.Value, 
                    BranchVals.Rw, 
                    BranchVals.Cl
                FROM BranchVals 
                CROSS APPLY LastBranch;
    
        -- For branch cells other than the first one, make a copy of all candidates
        WITH RankedBranches AS (
            SELECT ROW_NUMBER() OVER ( ORDER BY Branch ) rnk, Branch, Value, Rw, Cl
            FROM #branchCells
            ) INSERT INTO dbo.Candidates ( Branch, Value, Rw, Cl, Sq )
                SELECT r.Branch, 
                    c.Value, 
                    c.Rw, 
                    c.Cl, 
                    c.Sq 
                FROM RankedBranches r
                JOIN dbo.Candidates c ON c.Branch = ( SELECT TOP(1) Branch FROM rankedBranches )
                    AND r.rnk > 1;
                
        -- For branch cells other than the first one, make a copy of the solution
        WITH RankedBranches AS (
            SELECT ROW_NUMBER() OVER ( ORDER BY Branch ) rnk, Branch, Value, Rw, Cl
            FROM #branchCells
            ) INSERT INTO dbo.Solution ( Branch, Value, Rw, Cl )
                SELECT r.Branch, 
                    s.Value, 
                    s.Rw, 
                    s.Cl
                FROM RankedBranches r
                JOIN dbo.Solution s ON s.Branch = ( SELECT TOP(1) Branch FROM rankedBranches )
                    AND r.rnk > 1;
    
        -- Delete the specific candidates from the copies of all candidates
        -- corresponding to the branching values
        
        DELETE dbo.Candidates
        FROM dbo.Candidates c
        JOIN #branchCells b 
        ON c.Branch = b.Branch
            AND c.Cl = b.Cl
            AND c.Rw = b.Rw
            AND c.Value != b.Value;
    
    END
    

    If you are still with me, wonderful! This is a bit of a marathon post, but we are near the end.

    Solver

    The last piece we need is just a “driver” script to loop through these procedures in the right order, with some flow control. The idea here follows the algorithm at the top of this post: run a loop, alternate inserting solved cells and removing candidates, until a solution surfaces. If the loop stalls, then branch, and if a branch fails, then delete it. The script is also decorated with statements to load and display the puzzle, to track progress, and a timer to see how long the solution took to produce:

    USE Sudoku
    
    -- Setup for a puzzle
    
    TRUNCATE TABLE dbo.Solution;
    
    -- Insert the values for a new puzzle here OR:
    
    INSERT dbo.Solution ( Value, Rw, Cl ) 
    SELECT Value, Rw, Cl 
    FROM dbo.Sample5; -- < Change this sample table to try other puzzles
    
    -- Display the unsolved puzzle
    
    SELECT 
        A, B, C, D, E, F, G, H, I 
    FROM dbo.DisplaySolution 
    ORDER BY Rw;
    
    -- End Setup
    
    -- Solve the puzzle
    
    SET NOCOUNT ON;
    
    DECLARE @starttime datetime2, 
            @endtime datetime2
    
    SET @starttime = SYSDATETIME();
    
    PRINT 'Compute all candidate values'
    EXEC dbo.MakeCandidates;
    
    DECLARE @currBranch int = 0,
            @branchSolvedCells int = 0,
            @passSolvedCells int = 0,
            @puzzleSolved bit = 0;
            
    WHILE 1 = 1 
    BEGIN 
    
        -- Attempt to move solved cells' values from Candidates to Solution
        -- one branch at a time
        
        -- This variable tracks whether the solutions are making any progress
        -- and controls branching solutions that have "stalled"
        SET @passSolvedCells = 0;
        
        DECLARE branches CURSOR FAST_FORWARD
        FOR 
        SELECT DISTINCT Branch FROM dbo.Solution;
            
        OPEN branches;
        FETCH NEXT FROM branches INTO @currBranch;
        
        WHILE @@FETCH_STATUS = 0 
        BEGIN
            BEGIN TRY
            
                PRINT 'Insert solved cells branch ' + cast( @currBranch as varchar(5) );
                
                EXEC dbo.PushSolvedCells @branch = @currBranch, @count = @branchSolvedCells OUTPUT;
                
                IF ( ( SELECT COUNT(*) FROM dbo.Solution where Branch = @currBranch ) = 81 ) 
                BEGIN 
                    SET @puzzleSolved = 1; 
                    BREAK; -- Solved!
                END; 
                
            END TRY
            BEGIN CATCH
            
                IF ERROR_NUMBER() = 2601 BEGIN 
                    
                    -- Constraint violation means branch has a contradiction and cannot be the solution
                    -- Remove invalid branch
                    
                    PRINT ERROR_MESSAGE()
                    PRINT 'Constraint Violation, Purge Branch ' + cast( @currBranch as varchar(5) );            
                    
                    DELETE dbo.Solution WHERE Branch = @currBranch
                    DELETE dbo.Candidates WHERE Branch = @currBranch
                    
                END
                ELSE BEGIN 
                    -- If the code is correct we should never hit this block
                    RAISERROR( 'Unhandled error', 11, 1 );
                    BREAK;
                END
            END CATCH
            
            SET @passSolvedCells += @branchSolvedCells;
            FETCH NEXT FROM branches INTO @currBranch;
            
        END
        
        CLOSE branches;
        DEALLOCATE branches;    
        
        IF ( @puzzleSolved = 1 ) BREAK; -- Solved in the last pass!
            
        IF @passSolvedCells = 0 
        BEGIN 
            -- Our rules of elimination didn't isolate any new values
            -- branch one of the solutions
            PRINT 'Stuck! Branch!' ;
            EXEC dbo.Branch ;
        END ;
        
        -- Remove candidate values from all branches using process of elimination
        PRINT 'Eliminate Candidate Values'
        EXEC dbo.EliminateCandidates ;
        
    END ;
    
    SET @endtime = SYSDATETIME();
    
    -- Display the solution
    SELECT 
        A, B, C, D, E, F, G, H, I 
    FROM dbo.DisplaySolution 
    WHERE Branch = ( SELECT Branch 
                        FROM dbo.Solution 
                        GROUP BY Branch 
                        HAVING COUNT(*) = 81 )
    ORDER BY Rw ;
    
    SELECT DATEDIFF( millisecond, @starttime, @endtime ) AS Duration;
    

    Working sample code for the database and the solver script will be attached to this post if you want to try this out. It was authored on SQL Server 2008 R2. I’d welcome any and all observations, suggestions, improvements! Just leave notes below. Enjoy!

  • Speaker Prep Tip: Use the AV Studio Built into that Laptop

    Over at erinstellato.com there is a great post this week about tips for new presenters. Ms. Stellato suggests, insightfully, that we record ourselves, which is really a fantastic piece of advice. What’s extra-cool is that today you don’t need any special equipment or expensive software to do just that. This week I “filmed” two run-throughs of my talk for SQL Saturday tomorrow. For me, the timing is the hardest thing – figuring out how much content I can really present in the time allowed without going way over, and how much time each demo or module should take – and filming myself was a great way to zero in on that.

    What You Need

    Getting over feeling awkward is the hard part. The tech is super simple:

    • A laptop with a bezel camera (or a desktop and web cam)
    • Free screen recording software. I use Microsoft Expression Encoder 4, mainly for the price point ($0)
    • Simple webcam software. Here I use Lync’s video conferencing window without a call. It’s also possible to use something like WebcamViewer. (The fact that this function is missing from out-of-the-box Windows 7 is a major blunder IMO.)

    Simple Setup

    First, start an Expression Encoder session of the type “live broadcasting project.” In Encoder, add the screen as a live source, and if you have more than one monitor, choose the one where your presentation will run. Set the audio source to your laptop’s mic.

    Next, on the Encode tab, boost the resolution of the video output to something legible like 1024 x 768. We are not going for quality here, but it has to be good enough for you to evaluate yourself.

    Set the Output tab to go to File Archive and choose an output folder for the video files.

    ExpressionEncoder

    Now set up the display for your presentation. If you are using PowerPoint, fire that up. Set up your demos.

    Start ZoomIt. You do have ZoomIt, right? If not, stop now and go get ZoomIt. I’ll wait.

    (It’s vital to practice setting up the presentation, surprisingly, because at the start of one you’ll be nervous, and fumbling with tech in an unfamiliar place, in front of people, and with a new and probably dicey projector to deal with. Its easy to miss something.)

    Finally, start the web cam working and put it right on the screen with your presentation (bottom right corner works well). It helps if your camera viewer has a “keep on top” option.

    Presenter, Record Thyself

    That’s basically it. Start the Encoder, and pretend you are presenting to someone on Skype. Or your mom. Or the Greater Moose Factory Ontario virtual chapter of PASS. Whatever makes you feel less ridiculous. That awkward feeling lasts about ten minutes, and then you are just presenting away.

  • T-SQL Tuesday #34: HELP!

    TSQL2sDay150x150_thumbI owe my career to the SQL Server community, specifically the Internet SQL Server community, so this month’s T-SQL Tuesday is especially poignant. I changed careers “cold” about eight years ago, and, while I had some educational background in computer science, I had relatively little real-world DBA experience. Someone gave me a shot in the form of an entry level job, for which I am grateful, but I also had to make the argument to him that I would figure out whatever I needed to do to be successful in that position. Without the SQL community, that would never have been possible.

    Here’s how:

    Early on I used the heck out of traditional forums, especially places like SQL-Server-Performance.com, back when it was run by Brad McGehee, and SQLServerCentral.com. I read a lot after hours, Joe Celko and CJ Date for theory and Kalen Delaney’s phenomenal “Inside SQL Server 2000” first, then a stack of other titles. I read SQL Server Magazine. I read forms by the hour, just soaking up the issues other people were facing, and the solutions.

    I still always keep Kalen’s book in a special spot near my desk. It’s in pretty good shape:

    InsideSQLServer

    When, at my first DBA job, we really needed professional help wrangling an ISV, the internet community helped me find Ron Talmage and Joe Chang, when the company had no idea where to look for high caliber SQL Server consulting. They were amazing.

    Once I got a little experience and knew at least one or two things, I started trying to give back. On the forums I ventured answers for some easy questions. (I still didn’t know much. Heck, I still don’t now.) Brad McGehee was kind enough to accept a couple of articles I wrote based on technical challenges from work – articles which lost their relevance years ago, but gave me some practice and some confidence writing. When StackOverflow arrived on the scene I switched to that pretty exclusively as a forum site, if only because I like the format and the UI so much.

    When SQL-Server-Performance.com changed hands in 2006, I wanted to keep writing, and realized that blogging was probably the thing to do rather than formal articles. My writing is uneven, but the practice helps, and I find that forcing myself to compose an argument or explain a solution really helps to solidify my own knowledge. The old saying, “if you want to learn, then teach” really holds true. I was able to secure a spot blogging here at sqlblog.com in the summer of 2009 with Adam Machanic, and I still try to contribute odds and ends if I think they will be useful to someone out there on the interwebs.

    Somewhere in there I reluctantly accepted the fact that I should try Twitter, and unfortunately or fortunately, depending on whether you ask my wife, I’m fairly addicted. #SQLHelp is a marvel. I have also begun attending submitting sessions to SQL Saturday events, which is both a blast socially, and incredibly helpful professionally.

    That’s my history – I hope, if you are reading this from a position like mine, that you’ll consider doing two things: take advantage of the community resources out there, then, when you are able, pay it forward.

  • New Project Starting. Got Gas?

    “Storage is just like gasoline,” said a fellow DBA at the office the other day.

    This DBA, Mike is his name, is one of the smartest people I know, so I pressed him, in my subtle and erudite way, to elaborate.

    “Um, whut?” I said.

    “Yeah. Now that everything is shared – VMs or consolidated SQL Servers and shared storage – if you want to do a big project, like, say, drive to Vegas, you better fill the car with gas. Drive back and forth to work every day? Gas. Same for storage.”

    This was a light-bulb-above-my-head moment.

    Now that everything is consolidated onto shared infrastructure, all the way down to complete servers, the way we think about funding IT projects has to change too. It used to be that if you wanted to do a project, you would enumerate what the systems would cost, then price and go buy them. It was like this: this new project will need a bulldozer and an excavator, and maybe a Super-Zooper-Flooper-Do, let’s buy them for the project, and then they will arrive on a truck and we will install them, and the project will move forward. Many people are still thinking this way, but it’s now officially backward. We don’t buy discrete items for projects anymore, we buy a slice of shared infrastructure. And planning for that infrastructure has to change, or you will be, as many organizations are, forever, endlessly, exasperatingly short of it.

    Gas Up Early

    Imagine you and your friends are cruising down the road on a beautiful day, and someone decides you need to, simply MUST drive to Southern California. Do you at that point look around at each other and say “OK, who has gas money?” Perhaps. But hopefully not if you run a large business.

    Worse, do you just start driving that direction, and when you get down to 1/8 of a tank, then ask everyone in the car? Again, maybe, but not too many people travel this way who are over 25. I think, anyway.

    So the obvious question is, and I see this in many companies, why do we pile projects onto shared infrastructure like SAN storage and VM clusters without planning what infrastructure will be required to take us where we want to go? Answer: the organizations haven’t finished shifting their thinking. They think, hey presto, now we don’t need to buy those unique pieces of equipment any more, we just get “free” VMs and databases and storage from that magic bottomless pool. But that’s only the first stage. They haven’t realized yet, at an organizational level, who fills that pool of resources up, and how quickly, and how much it costs.

    Watch the Gauge

    Part of the difficulty is there’s no single “gas gauge” to tell an organization how the shared infrastructure is doing – you need some clever, forward thinking administrators to do that, and they, in turn need some tools. Further, it’s pretty hard today to estimate what “slice” of shared infrastructure a project will need, and how or whether to literally charge back for that resource. That means you have one arm making plans for all the places the organization will drive, with no idea how much gas is in the tank, and perhaps another arm with its eye on the fuel level, but which doesn’t know what the travel plans are. If you just start driving, at some point someone’s going to be standing by the side of the road with a thumb out and a gas can.

    And here’s another gotcha – you can’t, from a practical point of view, keep on filling this tank one gallon at a time, while always near empty. It’s not safe or economical. Do you really want to buy disks or shared servers and try to install them monthly? Weekly?

    So start thinking about your servers and storage as a commodity, and do it now. Try to get your organization to make this simple shift – we don’t buy pieces of equipment for projects anymore. We buy a platform, then estimate how much more of that platform we need for all upcoming work, and to sustain growth, then implement it.

  • Presenting Beginning PowerShell at SQL Saturday 149 MN

    I am happy to be presenting a session on beginning PowerShell for DBAs at my new home town’s SQL Saturday! (I moved from Seattle to Saint Paul, MN a short time ago.) I will be sharpening this presentation up to make sure anyone who comes will not go away empty handed.

    BTW, WOW, the schedule is up and I must admit I did not expect nine tracks of awesome. This looks amazing.

    My session is geared toward helping those DBAs who have not seen PowerShell, or perhaps may find PowerShell syntax opaque or confusing, get a conceptual understanding of how it works, in order to get past that first learning bump and into the useful stuff. I think that if you can write a Select statement, and you know in very basic terms how a query plan works, that the PowerShell pipeline is not really so different. Come and see why!

  • Rules-Driven Maintenance

    I haven’t posted for a while, but I hope that today I’ve got something really interesting to share.

    Even though I love SQL Server, there are a few areas that have always sort of disappointed me:

    1. People around the world have constantly had to implement and re-implement essentially the same patterns for database maintenance and backups.
    2. Even though those patterns are well established, there isn’t much out of the box in SQL Server -- and I include Maintenance Plans and Central Management Server/Policy-Based Management -- that works well to implement them, especially across a large number of instances. There are some very good community-supplied solutions ( http://ola.hallengren.com/, http://sqlfool.com/2011/06/index-defrag-script-v4-1/ ) that help, which is great, but the popularity of those solutions also demonstrates that the gap in the product is real.
    3. I personally found I was spending a lot of time setting up, checking on and troubleshooting basic maintenance and backup jobs, and after a while it becomes very repetitive work. We have about 2000 databases and a team that has ranged from one to four people.

    Some simple examples:

    1. You want all databases on all production instances to back up nightly, in sequence, during a maintenance window. If a new database is added, it should also start being backed up. Old files should be removed in some safe, reasonable fashion.
    2. You want log backups for any databases that are in full recovery mode, on any instance, to happen at least every 15 minutes, and more often if the log files are filling up.
    3. You want any indexes that are fragmented, and only those that are fragmented, to be rebuilt or defragged nightly during a defined maintenance window.
    4. You want all databases on all instances to be verified with CheckDB nightly or weekly, in sequence, at a specified time.

    Basically anybody who manages small to mid-size databases has these rules about maintenance, and they are always very similar, but we have to go around and set up many, maybe hundreds, of jobs and scripts to implement those rules. Editing the jobs when the rules, or perhaps the deployed databases, change is really tedious.

    Solution: Rules-Driven Maintenance

    I came up with a system to help manage this with less manual effort. The system consists of tables containing preferences that describe how maintenance and backups should operate, and some stored procedures and functions that implement those preferences in an automated, instance-wide maintenance routine. I was thinking of calling this “Policy-Based Maintenance,” but I don’t want it to be confused with the SQL Server feature Policy-Based Management (PBM) because it’s not really the same animal. (I like PBM, but don’t get me started about how the SQL Server team forgot to finish that feature before moving on to other things...)

    Here is a brief synopsis of how the system is implemented, using full backups as an example:

    1. There’s a policy table called FullBackupDefaults that defines the basic preferences about when and where full backups should be taken.
    2. There’s a second table called FullBackupOverrides where it’s possible to make exceptions to the default settings for specific databases.
    3. A table-valued function compares those preferences to the state of the databases on the instance including their backup history, and generates a “work queue” – a list of databases to back up – for the current time.
    4. A SQL Agent job is spawned that processes that work queue, iterating through the listed databases and backing them up.
    5. When the spawned job completes successfully, it is deleted.

    If a new database is added to the instance, it’ll automatically become part of this process, using the default settings. If you have a database that needs special treatment, let’s say weekly full backups instead of nightly full backups, then you can make an entry in the FullBackupOverrides table that describes the different policy to apply to the exception database.

    Using that general idea, I created similar maintenance code for both data (full and differential) and log backups, index maintenance, statistics maintenance and integrity checks. Each type of activity has its own pair of preference tables following the same design as the example above. With this solution, I can deploy maintenance across a large number of SQL Server instances, and set preferences for how it should act, rather than making and maintaining detailed SQL Agent jobs or maintenance plans, which would be hard to administer centrally. The maintenance will handle new databases fairly seamlessly, so there is no need to go back and constantly check the servers, other than to run some PBM rules against them from the Central Management Server to be sure things are healthy.

    What about the existing community solutions?

    You should definitely check these out:

    http://ola.hallengren.com/

    http://sqlfool.com/2011/06/index-defrag-script-v4-1/

    They are great, and have some better and different features than this solution. The one feature that I think distinguishes mine is just the implementation of preference tables to drive all the maintenance activity.

    Disclaimers

    The code for the system is posted here, and you are welcome to download it and try it out. Please note that this is not a commercial solution, and that while you may test or deploy this, you do so at your sole risk. Edits to the code almost certainly will be required for your environment. Please read, understand and test the code thoroughly before you even imagine using this in production. I do not want to cause you or your employer any pain. License for using this solution is GPL 3.0: http://opensource.org/licenses/GPL-3.0. You may use and modify this code, but not sell it.

    This has been run on SQL Server versions from 2005 to 2008 R2. It might work on 2012, but I have not tested it.

    I run only case-insensitive collation; if you are on a case-sensitive system, then you have my sympathy, and you might have to edit a few things to make this work.

    There is coverage for Quest Litespeed backups in the solution; other third-party backup products could be incorporated easily, but have not been. It should be possible to follow the pattern used to handle Litespeed and extend it to any other third-party stored procedures.

    If you haven’t done index maintenance for a while, and you flip this solution on you might blow up your transaction log. Be sure that you have plenty of log space and plenty of log backup space, and watch the process the first few runs. This is especially true if the combination of Standard Edition of SQL Server and the preference for Online index maintenance applies in your environment, because the maintenance process will perform index maintenance using reorganize (defrag), which produces a lot of log.

    Lastly, this is a solution for the type of environment that has many small databases, not the type with a few big, performance intensive databases. Databases that are large and require detailed attention may not be suitable for a one-size-fits-all solution like this.

    Walkthrough

    OK, with all the fine print out of the way, here’s how you can try this code. You will need these prerequisites:

    1. PowerShell 2 and SSMS / SMO installed on the workstation from which you will deploy the solution. The deployment is handled by a PowerShell 2 script that uses invoke-sqlcmd. Before running it, you’ll need to have the SQL Server Snap-ins for PowerShell installed and working. The solution itself, after deployment, does not require PowerShell on the servers where it is deployed.
    2. The full backup procedure in this solution calls a file delete procedure that I have implemented as a CLR stored procedure. You’ll need either mine, or you’ll need to substitute your own, or comment out the calls to that procedure and handle deletes some other way. The one I use is available here.
    3. Windows-authenticated sysadmin access to your SQL Server environment. The deployment script uses windows auth and your own identity. It could be edited to support SQL auth if necessary, but it does not by default.

    To install on your test server

    1. Download and unpack the code into a convenient folder.
    2. Read and edit Deploy.ps1, entering the name of your test server, and the location of the deployment SQL scripts, according to the instructions in the file. The deployment script can push the solution out to one or to multiple SQL Server instances.
    3. Execute your edited Deploy.ps1 to install the solution to the specified server.

    The deployment process will populate an admin database on the SQL instance with the policy tables, stored procedures and functions, and will add five SQL Agent jobs to the instance. You can use your own admin database or allow the script to create one for you.

    There is one SQL Agent job for each type of activity required for maintenance:

    Policy-Based Data Backups

    Policy-Based Log Backups

    Policy-Based Integrity Checks

    Policy-Based Index Maintenance

    Policy-Based Stats Maintenance

    Each job typically works like this:

    1. On the job schedule, each job will check to see if it’s time to do any work. (For example, is it now the start of the backup maintenance window for any database(s)?) The work, if any, is listed using a table-valued function that takes the current time as a parameter.
    2. If there is work to be done according to the maintenance preferences, then the system will spawn a new SQL Agent job, specifically for that work. Example: Run index maintenance as of Feb 5, 2012 at 21:00.
    3. The newly spawned SQL Agent job will iterate through the work queue for that time. If the job is successful, it will be automatically deleted. If there is an error, it will remain on the server for troubleshooting, or for monitoring software to alert on.

    Fix This One Broken Thing

    In the Full Backups stored procedure, there are calls to a proc called DeleteAgedBackupFiles. You, most likely, don’t have said procedure :-). You can either remove this call, substitute your own, or compile the CLR code and install it from Prereq #2 above. I didn’t do this to be mean – this is a cheap trick to make you, gentle reader, not install in production until you have read and understood the code.

    Setting Preferences and Policies

    The rules driving the maintenance jobs are stored in a few simple tables. There are views over those tables that give a summary of what preferences are in effect:

    SELECT * FROM dbo.FullBackupPolicies

    SELECT * FROM dbo.DiffBackupPolicies

    SELECT * FROM dbo.LogBackupPolicies

    SELECT * FROM dbo.IndMaintPolicies

    SELECT * FROM dbo.StatsMaintPolicies

    SELECT * FROM dbo.IntegCheckPolicies

    Behind each view is a pair of tables, one for default settings that apply across the whole instance, and a second for database-specific overrides or exceptions to those defaults. To adjust the behavior or the timing of the different kinds of maintenance, just adjust the preference values in those underlying tables.

    Some examples:

    Q. How do I set the time for my integrity checks window for all databases?

    A. In the table dbo.IntegCheckDefaults set the value for WindowStartTime to the desired time, as 24-hour time, format hh:mi.

    Q. How would I limit integrity checks for one larger database to only Saturday nights?

    A. In the table dbo.IntegCheckOverrides, make a row with the database name, and set the value for all days of the week, except Saturday, to 0. Set Saturday to 1.

    Q. How do I make one larger database use differential backups nightly, with one weekly full backup, while the other databases get a nightly full?

    A. In the table dbo.FullBackupOverrides, make a row for the large database. Set all the values for days of the week to 0 except the day you want the full backup. In the table dbo.DiffBackupOverrides, make a row for the database and set all the days of the week to 1 except the day you want the full backup.

    Q. How can I make the system ignore backups for one large database that I explicitly don’t want to back up?

    A. In the table dbo.FullBackupOverrides, add a row for the database where Exclude = 1.

    This same pattern should work to set any and all preferences.

    Turning On Automated Maintenance

    The deployment script will install the SQL Agent jobs as “disabled,” as a safety net. When you are ready to begin using the automated maintenance, just enable the five SQL Agent jobs.

    With the jobs running, watch the SQL Error Log periodically, and the file system, to be sure you are getting all the maintenance activities when you expect to, and that the agent jobs are not failing.

    Feedback

    I would LOVE feedback on this solution and ideas about how to make it better. If you try it, please come back here and let me know how it went.

    (F.) A. Q.

    Here are some details that may be helpful:

    Q. What are the available preferences?

    WindowStartTime All maintenance activities except log backups have a specified time; this is the start time for that window, daily.
    WindowDurationMins Index maintenance can be limited to a set duration past which new rebuilds will not start. This value is the length of the window. (A rebuild in progress, however, will run to completion.)
    Sunday - Saturday Days of the week on which to perform the maintenance activity (Bit columns)
    MaxIntervalHours (Minutes) For backups, the maximum length of time between backups. For example, you can set a rule that says a full backup should be taken at least every 24 hours, or a log backup at least every 15 minutes.
    MaxLogPercentUsed The limit of log percent full that will trigger a log backup. The frequency of log backups is governed both by MaxIntervalMinutes and by MaxLogPercentUsed.
    Path The path for file output. If NULL, the default path set at the instance level will be used.
    ThirdPartyBackup Optional third party backup. Only Litespeed is implemented. Can be NULL (implies Native), ‘Native’ or ‘Litespeed’
    UseDifferentialBackup Bit indicating that differentials should be used.
    FullBackupSequencesToRetain For the backup cleanup process, indicates how many sets of Full, Differential and Log backups to keep on disk.
    DeleteOnlyArchivedFiles For the backup cleanup process indicates whether to delete files where the “ready to archive” attribute is set in the OS. (Can prevent deleting files that have not gone to tape.)
    DatabaseName For override tables, the name of the database where the overrides apply.
    Exclude Ignore/skip the database (Bit)
    Online For index maintenance, try to keep the indexes online (1) or do not try (0). For Enterprise Edition, Online = 1 will cause online index rebuild; for Standard Edition, Online = 1 will cause reorganize instead of rebuild. Watch the log file size and backup disk capacity if using defrag!
    DefragThreshold When examining indexes for fragmentation, the threshold for percent fragmentation that will cause index reorganize.
    RebuildThreshold When examining indexes for fragmentation, the threshold for percent fragmentation that may cause rebuild (subject to the Online preference and SQL Server edition as described above)

    Q. Why does the log backup policy job never stop?

    A. The log backup policy job is implemented as a loop, with a one minute delay, so as not to flood the job history with executions on a one-minute interval. It runs every minute to keep up with log files that might be filling up any any given time. It doesn’t do work every minute; a backup is only executed for a given database when MaxIntervalMinutes or MaxLogPercentUsed is exceeded.

    Q. I set my WindowStartTime to 03:15 or 11:45 and now I don’t see any maintenance/backups? Huh?

    A. The Policy jobs can run on any interval, at any time, but at least one execution has to “line up,” within a few minutes, with the WindowStartTime. That is, if the policy job runs every hour, on the hour, but your window is set to start at 15 minutes after the hour, the job will never find work to do. Set the job schedules to align with the corresponding WindowStartTime.

    Q. Why did the index maintenance job fill my Standard Edition log?

    A. See if you have the combination Standard Edition and Online = 1 for index maintenance. That will cause indexes to be maintained using Reorganize, which is very log-intensive. You can: a. watch and wait, because the level of fragmentation, and so the amount of log generated, might be reduced with more regular maintenance depending on the database. That is, it might be a temporary issue. b. Turn the Online preference off, if your application is not 24x7 and you can have blocking during a maintenance window. c. Get the checkbook and perform an edition upgrade to Enterprise.

    Q. How the heck to these crazy overrides work?

    A. The design of the overrides is: for each database, for every value in the preferences, IF the override table has a non-null value, use it. If the override table has no value or the value is NULL then use the corresponding default value. This means that when you add a row to an overrides table, you only have to fill in the specific values that are different from the defaults, and leave the rest as NULL. The NULLs will “inherit” the default values from the defaults table at run time.

    Q. What happens if a job fails?

    A. Failure typically will result in an extra job (or many extra jobs) showing in SQL Agent Job Activity Monitor, with a name that indicates the date and time that the job was intended to run. The job history can be used to troubleshoot what happened. Successful maintenance jobs will delete at the end of their run, but failed jobs will remain for troubleshooting.

    Q. How can I see what index maintenance was done?

    A. Select from the TrackFragmentation table

    Q. Does this work with:

    Log Shipping Yes. Log backups should work by firing the log backup job in SQL Agent instead of running a conventional log backup. You may elect to disable the log shipping log backup job’s schedule (but don’t delete the job) to prevent collisions between the schedule starting the job and this solution starting the job.
    Mirroring Yes.
    Replication Probably. I don’t use replication much, so it has not been tested.
    Clustering Yes.
    Named Instances Yes. For multiple instances on the same machine, consider staggering the WindowStartTime values on each instance.
    2012 I hope to find out soon. Looks promising.
    2000 Um … no.
    VLDB Not really. This solution targets many small, simple databases. It does operations like sp_updatestats that might not be appropriate for a big system.
    Partitioning Yes, there is automatic detection and index maintenance for partitioning on Enterprise Edition

    Q. I have an admin database already. Can I use it?

    A. Yes. Check for name collisions for objects in the database, and then just enter your preferred admin database name in the Deploy PowerShell script.

    Q. The SQL scripts don’t include “Use <database>?”

    A. True. The T-SQL scripts are designed to be used with a database that has the name of your choosing. (In my shop, all the admin databases have unique names per instance, by design.) The PowerShell deployment script handles the database name. If you run the scripts from SSMS, be sure to set the database manually.

    Q. The deployment script croaked in the middle. Can I rerun it?

    A. Yes, the deployment scripts are all designed with “If not exists…” and can be rerun if necessary.

    Q. I have tested the solution and I want to deploy on multiple instances. How?

    A. The Deploy.ps1 file can use a list of SQL Server instances to deploy across multiple servers if desired. Keep in mind that the Policy agent jobs are disabled by default when deployed and will need to be switched on.

    Q. Why is WindowStartTime implemented as text! You Hack! :-)

    A. SQL Server 2005

    Note: I’ve published some updates to this code, so please grab the latest version here.

  • T-SQL Tuesday #31: Paradox of the Sawtooth Log

    TSQL2sDay150x150_thumb

    Today’s T-SQL Tuesday, hosted by Aaron Nelson (@sqlvariant | sqlvariant.com) has the theme Logging. I was a little pressed for time today to pull this post together, so this will be short and sweet. For a long time, I wondered why and how a database in Full Recovery Mode, which you’d expect to have an ever-growing log -- as all changes are written to the log file -- could in fact have a log usage pattern that looks like this:

    SawToothLogUsage

    This graph shows the Percent Log Used (bold, red) and the Log File(s) Used Size KB (light, green) counters for a demo database. The database was in Full Recovery mode the whole time this graph was generated, and there were no log backups taken. It seems, naively, that the graph should just climb ever upward as changes are added. How could it possibly drop and then start to climb again? Mysterious.

    >> If I had time and wit, this is where I would do some clever Scooby Doo or Nancy Drew thing. Not tonight, I’m afraid. <<

    I learned the answer in an amazing session with SQL Skills, during their IE1 class. The answer is that the amount of space counted as “used” in the log includes the log records physically present in the log file and in use (for full recovery, mirroring, replication, etc.) plus the amount of space that would be required to add rollback records, which do not actually exist, for all inflight transactions.

    When you make a change to a database, the change is recorded first, obviously, as a series of transaction log records that can be played forward. In the event of a crash, when recovery runs, those records are replayed forward to recreate the changes, as one stage of the recovery process. But if a transaction is incomplete, somehow the instructions need to be present to undo and reverse out the changes made in that transaction. The way that works is that at the time of rollback, the reverse operations from the original log records are added to the log, then those reversal records are also played forward, which “undoes” all the changes. The reversal records do not exist, though, unless a rollback is actually invoked.

    If the server ran out of space in the log file to append these rollback records, obviously there would be a major problem, and we might not be able to recover the database. In order to prevent that condition, SQL Server will reserve the space necessary in the log file(s) and have it appear “used” until a given transaction commits, at which point the crash recovery space for these hypothetical reversal records is no longer a requirement.

    Here’s a demo script if you want to see this in action. Note that it takes a fair amount of disk space:

    USE master ;
    GO
    
    -- Make a demo database:
    
    CREATE DATABASE LogDemo ;
    GO
    
    ALTER DATABASE LogDemo SET RECOVERY FULL ;
    GO
    
    ALTER DATABASE LogDemo 
         MODIFY FILE ( NAME = N'LogDemo', SIZE = 2000MB , FILEGROWTH = 0)
    ALTER DATABASE LogDemo 
         MODIFY FILE ( NAME = N'LogDemo_log', SIZE = 2000MB , FILEGROWTH = 0)
    GO
    
    USE LogDemo
    GO
    
    CREATE TABLE DummyData ( Padding char(800) ) ;
    GO
    
    -- Init full recovery mode by taking a full backup:
    
    BACKUP DATABASE LogDemo TO DISK = 'LogDemo.BAK'
    GO
    
    -- At this point, start Performance Monitor, and add the counters
    -- Percent Log Used and Log File(s) Size Used (KB) for the demo DB
    -- Set the display scale of the latter to 0.00001 to match
    -- the screenshot
    
    -- Fill the test table with a generous amount of data:
    
    DECLARE @i int = 1 ;
    
    -- You can experiment with how much data by adjusting this
    -- limit:
    
    WHILE @i < 2 BEGIN
    
        BEGIN TRANSACTION ;
    
        WAITFOR DELAY '00:00:03' ;
    
        -- With each iteration of this loop, the log use will ramp up
        -- in the "intuitive" way we expect, growing after each insert:
        
        WITH  
            t1 AS ( SELECT REPLICATE( 'x', 800 ) AS Padding 
                    UNION ALL SELECT REPLICATE( 'x', 800 ) AS Padding ),
            t2 AS ( SELECT x.Padding FROM t1 x CROSS JOIN t1 y ),
            t3 AS ( SELECT x.Padding FROM t2 x CROSS JOIN t2 y ),
            t4 AS ( SELECT x.Padding FROM t3 x CROSS JOIN t3 y )
            INSERT INTO DummyData ( Padding )
            SELECT x.Padding from t4 x CROSS JOIN t4 y ;
            
        COMMIT ;
        
        SET @i = @i + 1 ;
        
    END ;
            
    GO
    
    CHECKPOINT ;
    
    WAITFOR DELAY '00:00:05' ;
    
    -- Here you can see the log usage for an index create:
    
    CREATE CLUSTERED INDEX clust ON DummyData( Padding ) ;
    
    CHECKPOINT ;
    
    WAITFOR DELAY '00:00:05' ;
    
    -- And this is where the "sawtooth" becomes visible:
    
    BEGIN TRANSACTION ;
    
        UPDATE DummyData 
        SET Padding = REPLICATE( 'z', 800 ) ;
    
    WAITFOR DELAY '00:00:02' ;
    
    -- Log use drops here:
    
    COMMIT ;
    
    WAITFOR DELAY '00:00:02' ;
    
    BEGIN TRANSACTION ;
    
        UPDATE DummyData 
        SET Padding = REPLICATE( 'z', 800 ) ;
    
    WAITFOR DELAY '00:00:02' ;
    
    -- Log use drops here:
    
    COMMIT ;
    
    WAITFOR DELAY '00:00:02' ;
    
    BEGIN TRANSACTION ;
    
        UPDATE DummyData 
        SET Padding = REPLICATE( 'z', 800 ) ;
    
    WAITFOR DELAY '00:00:02' ;
    
    -- Log use drops here:
    
    COMMIT ;
    
    WAITFOR DELAY '00:00:02' ;
    
    USE master ;
    GO
    
    DROP DATABASE LogDemo ;
    GO

    The best way to try this out is probably to highlight and execute a few lines at a time, and watch Performance Monitor to see if it does what you expect. Happy logging!

  • Execute T-SQL Across Listed Servers/Databases

    Here’s a handy trick – if you have a SQL script, stored as a text file, and need to run it against an arbitrary list of different databases on different SQL Server instances, try a little PowerShell. There are a lot of ways to accomplish this task, but personally I like this one for its flexibility.

    First, make sure the SQL script does NOT include the common USE <database>. Generally that statement is your friend, but not in cases where a single script should work against multiple databases having different names.

    Next, make a “collection of collections” in PowerShell, which will act as a list of instances and databases on those instances. Example:

    $instances = @(
        @( 'Server1', 'someDatabase' ),
        @( 'Server2', 'AnotherDatabase' ),
        @( 'Server32', 'FooDB' )
        # Repeat
    )

    Each item in $instances is a mini-collection, and each of those collections has two elements: an instance name and a database name, which will be available using indexes [0] and [1].

    If you prefer, it’s also possible to pull these values from a text file using Get-Content, and split each line on some delimiter character.

    Next, load the content of your SQL script file into a variable. There’s an important caveat here: we have to load it as a single string, not an array of strings. The default behavior of Get-Content, though, is to split a file on line terminators and make each line into a separate object in a collection of strings.

    There are a few ways to accomplish this, but I learned a simple one in a forum from Alex K. Angelopoulos – we can direct Get-Content to split on a character that doesn’t actually exist in the file. This makes the whole content of the file “look like” one line to Get-Content. He suggests the Null character, which in PowerShell is `0 (back tick zero). Note that the Null character is not the same or related to $null.

    Special Characters: http://technet.microsoft.com/en-us/library/dd347558.aspx 

    $deployScript = Get-Content 'C:\Path\To\Your\Script.sql' –Delimiter `0

    Finally, loop over the list of ( instances + databases ) to run the script everywhere. Be cautious and test!

    $instances | ForEach {
    
        # This should print the instance and database from your list, which is handy for troubleshooting:
        Write-Host $_[0] $_[1] 
    
        # This sort of thing can be used to validate that you are connecting to the right databases:
        Invoke-SqlCmd –ServerInstance $_[0] –Database $_[1] –Query 'Select @@servername, db_name()'
    
        # Finally, this would execute the script:
        Invoke-SqlCmd –ServerInstance $_[0] –Database $_[1] –Query $deployScript
    
    }

    There is no Undo, so test. Use your powers only for good!

    Edit 5/5/2012 – fixed missing commas in code sample.

  • Poor Man’s PowerShell TFS SSMS Integration

    This is lame. Still, here goes: I need, increasingly, to author both PowerShell and SQL Server scripts, bundle them into a solution and store that in TFS. Usually the PowerShell scripts are very closely related to SQL Server, and have a lot of SQL in them. I am hopeful that 2012 SSDT, or the tighter integration of SSMS and Visual Studio in 2012, might help put all of this in one place, but for now I am stuck in SSMS 2008 R2. So here are my blunt attempts to marry these activities.

    (This post is rather like the scene at the end of MythBusters, when the explosion didn’t really work, so they just put a lot of C4 under the <whatever is being used> and set it off.)

    eyebrow

    Get SSMS Connected to TFS

    First, I downloaded and installed the TFS integration bits for SSMS, known sesquipedalian-ly by the polysyllabic appellation “Visual Studio Team System 2008 Team Foundation Server Microsoft® Source Code Control Interface Provider.” Or TFS MSSCCI Provider, for short. This adds integration between SSMS and Team Foundation Server, so that I can create a SQL script solution and store it in source control, with check in/check out, etc. It’s sort of OK, actually, and miles better than nothing.

    Next I made a new SQL Server script solution in SSMS. I then right-clicked on it in the Solution Explorer and used the context menu to add it to source control.

    Make a PowerShell Script by Force

    Now for the PowerShell “integration.” In SSMS, in a solution, the “Miscellaneous” folder can contain any sort of files you like. So I right-clicked on the project, chose Add New Item. In the Add New Item dialog, I picked a template at random and forced the file name to myPowerShellScript.ps1. This makes the file, in the Miscellaneous folder (admittedly with a few lines of unwanted SQL code in it):

    NewPowerShellScript

    Create an “External Tools” entry for PowerShell ISE

    SSMS has long had the ability to launch an external program from a menu command. I hadn’t used this much in the past (never really saw the point) but in this case it works nicely. I clicked Tools > External Tools… and created a new entry for the PowerShell ISE:

    ExternalToolsDialog

    In Arguments and Initial Directory, it’s possible to put $()-style variables that are sensitive to the context in SSMS, in this case to pass the file selected in the Solution Explorer as an argument to launch the PowerShell ISE – basically allowing me to open a .ps1 file, in the right app, from inside SSMS.

    At this point, it’s possible to create a source-controlled PowerShell script in the SSMS solution, to open and edit it with the PowerShell ISE, and to check it in to TFS. Huzzah!

    Option: TFS Windows Shell Extension

    That’s pretty grand, I think, but as an added option, it can be handy to check out a PowerShell script, make a change, and then check it back in, without using SSMS. That’s possible using a little Team System widget called the Windows Shell Extension, which is bundled in this TFS Power Tools download: http://www.microsoft.com/download/en/details.aspx?id=15836.

    Note: this installer, by default, installs a bunch of stuff you don’t want, and not the thing you do, for this case, so use the Custom option in the installer and select the Windows Shell Extension feature.

    With the Shell Extension, you can see the source control state of files right in Windows Explorer, with a little green icon overlay. There is a right-click context menu to check files in and out.

    Adding this widget means I can check a file out in right in Windows, edit with PowerShell ISE, and then check it back in – great for those PoSh scripts that don’t have a lot of SQL dependencies.

    It’s not pretty, but it works. And, provided you have a legal TFS license, by my count the cost comes to … $0!

  • Microsoft Delivers Full Suite of SQL Server PowerShell Cmdlets

    We’ve all been waiting several years for this, and finally it’s here! Coinciding (approximately) with the release of SQL Server 2012, a new Feature Pack has appeared on the Microsoft web site that adds a full suite of PowerShell cmdlets for DDL and other functions. This means that, at last, we can do things like fully-featured SQL deployment scripts without all the (severe) limitations of T-SQL, such as primitive use of variables, flow control, exception handling.

    Taking a cue, finally, from the community project SQLPSX, the SQL Server team seems to have designed the new library with ease-of-use in mind, thankfully concealing a lot of the complexity of using raw SMO objects through a language like C# or PowerShell.

    Here’s a teaser showing just a few of these new offerings:

    New-Table –Schema dbo –Name Accounts –Columns $myColumns

    Remove-Table –Schema dbo –Name Accounts

    New-Index –Table $myTable –Columns $myIndexedColumns

    Set-Index –Name dbo.Accounts.idxBar –Columns $myRevisedColumns

    New-Procedure –Name Foo –ScriptBody $sqlcmd

    The list goes on! This will make deployment of changes to existing databases so much easier. No more insanely complicated T-SQL scripts with brittle Dynamic SQL and hazardous error handling to manage complicated flow control like “Create this object if it doesn’t exist already.”

    And all you admins aren’t left out in the cold either – check out the fresh cmdlets for managing SQL Agent jobs, and the incorporation of Agent into PBM!:

    New-AgentJob –Name MyJob –Type PowerShell –ScriptBody c:\scripts\foo.ps1 –Schedule $MySchedule

    Set-AgentJobSchedule –Name “Daily at 3:00 am” –StartTime “03:00” –Enable

    Set-ServerProperty –MaxDop 4 –MaxMem 8000GB

    I am SO excited, and this is amazing, but I haven’t had a lot of time to explore all the options. I hope to really dig in after reading everyone I else’s April Fool’s posts. Until then, you can download this feature pack yourself (probably) at:

    http://www.microsoft.com/download/en/details.aspx?id=88254 

    Edit April 2: Please note if you come back to this page later, this was an April Fool's joke. There are a handful of new cmdlets in SQL Server 2012, but nowhere near this level of support.

  • T-SQL Tuesday #028: Whaddya Mean, “Not Your Job?”

    TSQL2sDay150x150_thumb

    This T-SQL Tuesday, hosted by Argenis Fernandez (Blog|Twitter) is devoted to the question, “Are you a Jack-of-all-Trades? Or a specialist?” This question really hits home for me, on a number of levels.

    (Aside: I have huge respect for Argenis – he’s smart, funny, no-nonsense, very accomplished. If you don’t follow him, do.)

    If you have read any of my previous ramblings on this blog, you may know I was originally educated as an architect – the bricks and mortar kind, not the information systems kind. While I didn’t stay in that profession, I’m afraid it had a permanent influence on the way that I am wired. Architecture, especially architectural education, is grounded in a particular way of thinking about the intersection of design and engineering, which sticks with me.

    A building is a problem with an unimaginable number of variables. There are so many that, for one building program (that’s the problem statement about what the building is supposed to be for, and how big, etc.) on one site, there is probably an infinite number of really good solutions. There are variations on structure, light, materials, function, climate, culture, history, on and on. A very good architect can make a building that works, keeps the rain out, and lasts, but at the same time is also inspiring, creative, and enriches the lives of the people who use it.

    Contrary to popular belief, architects don’t do a lot of math. They don’t do the calculations for structural engineering, for example. In fact, in most places they are prohibited from doing it because they aren’t qualified. The skill that I think an insightful architect has is not engineering, but synthesis. From all the intertwined systems that make a building, structural systems, enclosure systems, mechanical and electrical systems, patterns of use, cultural influences, colors, natural light, the landscape, the architect can, through design, form a building into something that reveals and expresses those systems in a way that is not only functional but beautiful at the same time.

    In order to do that, the architect does need to genuinely understand how those systems work, their internal logic, but perhaps in a different way than an engineer understands them. The architect has to “zoom out,” to understand how each system works at a macro level, and how it relates to all the other competing systems, interests and priorities that compose the larger problem of making the building. The architect’s understanding has to be accurate, without necessarily being microscopic in detail. The micro-level answers are available when you need them, by hiring someone or by doing research -- and the world is just too large and complicated to literally know it all.

    Put another way: the hack architect either willfully stuffs a building into some predetermined form, or is ignorant or uncaring about making something, and settles for cheap, expedient corner-cutting. The skilled architect can synthesize the functions of the building, the site and the building’s construction into something elegant and beautiful, by understanding something about how each of those things work.

    So, what has all this got to do with the topic at hand? I would like, in my small way, to work in IT the way the serious architect designs buildings. It’s interesting to think about whether or not working that way is a specialty – and I don’t really have an answer. But I have found that zooming out, taking an interest in related systems, helps me tremendously as a DBA. Sometimes it’s not the micro details of other systems that matter, but spending the time “getting” the logic of other systems, languages, interfaces.

    Example: other computer languages. I am no developer, but I have great respect for the really good ones. I have tried to learn at a macro level how to code in Scheme, Java, JavaScript, VB.net, C# and recently PowerShell. I have put a lot of effort into understanding OO concepts. The same way an architect doesn’t actually calculate structures, you would not want me to code up your complex application. But as I try to formulate IT solutions, it’s been really helpful to know something about how that all works, and how what I do relates to it. I hope it will make me less likely to do the IT equivalent of jamming a square peg in a round hole, and more likely to help find the elegant solution.

    Example: Windows, Storage, Failover Clusters. To me it just seems natural to want to know more about the systems that my SQL Servers are running on. I’m not a very experienced Windows guy, and when I go and see the good ones present, I am humbled and amazed. But I have certainly tried to get some macro-level understanding of these other systems as a way to get better at my own job, and I’ve learned how to build things like failover clusters ground-up.

    That brings me back to the title of this post. “Not my job” is actually an interesting and nuanced concept. On the one hand, there are a lot of things I really don’t know about, and many I know barely enough about to be dangerous. I came late to this profession and I’m still a novice. I think it’s important to recognize that fact, in the same way the architect needs the engineer, and not create havoc imagining that I know more than I do. In short, I need other people who really know things. That’s the positive version of “not my job.”

    Of course, there’s the other interpretation, where “not my job” is really code for “I can’t be bothered,” or, “that’s not interesting.” That attitude I don’t get. There is very little in this world that is not interesting in some way. Architects tend to be interested in everything.

    I will leave you with two of my favorite buildings in the world, and a thought: these are made of the same steel, stone, wood and concrete as other buildings – what is it that makes them so amazing?

    Have a look at Louis Kahn’s iconic Salk Institute and Kimbell Art Museum

  • Unorthodox DBA Kits

    I get some strange looks from people when they see what machines I use in my DBA life. First it was because I was using a personal MacBook Pro. In the past few days, it’s because I’m not using it any longer. I still love the MBP, but so does my wife, and she, rightfully, didn’t love the idea that I was hogging it all the time. Now the MBP has returned to its quiet life at home, and I have a new Tablet with a touch UI for work. That was a huge change for me, because it’s the first PC I’ve ever bought. Yes, ever.

    Whacky Setup 1

    For two years I happily ran SQL things from a unibody MacBook Pro. There are a number of people in the SQL community who have used or switched over to the Mac with one of several Windows virtualization solutions, and the combination works great. The build quality and design of the Mac is excellent, and in my experience they are both reliable and a joy to use. Parallels, Vmware Fusion and even VirtualBox are all working, solid solutions to make Windows run on the Mac alongside OSX. Personally I’ve used the Vmware solution most, and tested with the VirtualBox solution. For work we have remote access using SSL VPN through the browser, so there’s really no need to worry about what operating system is on the client, as long as that works. From a security perspective, all work stays at work – literally inside the datacenter.

    After I had this machine for a while, my friend Aaron Bertrand, through his blog, convinced me to try Apple’s Magic Trackpad. While I was skeptical at first, that thing has completely won me over. The name is corny, but what a fantastic device.

    My Mac would drive a big external monitor, but just one – and with a mini Displayport Adapter to convert to DVI

    Kit 1

    MacBook

    13 Inch MacBook Pro, 8GB RAM, Core 2 Duo (this is a couple years old now)

    DVI Adapter

    Any Big Monitor (with a small adapter, the Mac can drive just about any external monitor – but only one). Mine is a work-supplied 24” HP LCD.

    Apple external Bluetooth keyboard and Trackpad

    Vmware Fusion for Macintosh

    Legit Windows 7 license from Microsoft Store, legit SQL Server installed for learning/test/demo purposes

    Samsung Android phone, with the Mac’s calendar and address book set to sync automatically through Google services

    Various cross-platform utilities including Dropbox, 1Password (together), Xmarks, Evernote, that hook Mac to Windows to Android

    The exchange with many of my co-workers and friends:

    Startled co-worker: “Wha? That works?”

    Me: “Yep.”

    Alas, when it came time to yield my much-loved Mac back to my family, I had to buy a new machine. In the intervening time, I have really become enamored of the touch interface on both my phone and on my kids’ Nook. But most tablets don’t have the horsepower or the OS to do everything I needed, and I really didn’t want another device to lug – having a tablet and a phone and a laptop has no appeal. 8GB of memory and either Mac OSX or Windows 7 was a requirement. It was a random tweet from Buck Woody that turned me. Solution? Lenovo x220 ThinkPad Tablet.

    Whacky Setup 2

    The “convertible tablet” design goes back a ways – when I was designing buildings in the early-mid 2000’s, the firm I worked for did a collaboration with Microsoft to help with the early implementation of XP’s tablet features on early convertible tablets. They had the same basic design as my new Lenovo. But back then it sucked, and there was no market.

    Fast forward to 2012 – now that touch interfaces have matured, tablets have become popular and useful, I think that the this time around this idea may work. This machine is a really fast laptop in its own right, and the touch interface is working for me as a tablet substitute. It seems strange, perhaps already antiquated, but I love the combination of touch screen, stylus, power of a full-on laptop, and good keyboard when I want it. It’s fantastic.

    Kit 2

    X220-tablet-1L

    x220 Convertible Tablet with i7, 8GB RAM, 160GB SSD

    Display Port to the same 24 inch LCD monitor

    Apple (yes!) bluetooth keyboard and trackpad (the Apple trackpad is far, far better than the Lenovo one)

    A ridiculous number of pointing options, including ten fingers, the stylus, the ThinkPad eraser thingy, the built in trackpad and the external one

    VirtualBox for experimenting with SQL Servers on a virtual network

    Same Android phone and features, but without the elegant Google sync (sad face)

    Same toolbox of syncing apps: Dropbox, 1Password (together), Xmarks, Evernote, etc.

    Same conversation with my co-workers.

    I just have a week and a half in with my new Frankenstein of a workstation, but it’s working for me. Next installment here – a comparative review of the two setups, and some tips about the flakier stuff, like how to make the Apple accessories and the ThinkPad talk, and what to expect if you are considering the switch to Mac / OSX as a SQL Server DBA.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement