THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

  • IF NOT EXISTS ( SELECT ‘thisPost’ ) CREATE POST thisPost AS

    T-SQL deployment scripts are tedious to write and they get no love. As a result, they are very often the most horrible, error-prone step-children of scripts*. One basic principle that could help is to develop a habit of using the “IF NOT EXISTS” or “CREATE OR ALTER” patterns in these scripts. The idea is to first check the existing database before creating new objects. This is far better, in my opinion, than the common “DROP and CREATE” pattern, which can lose data and/or permissions if misapplied. Whenever possible, this type of deployment code should use the SQL Server Catalog Views, and not the deprecated ones. INFORMATION_SCHEMA has been demonstrated to be flakey by others.

    Make It So

    I like to adopt an approach where deployment scripts

    • Are re-runnable without harm. I’m calling this the “F5 F5 F5 F5 F5” principle. DROP AND CREATE is your enemy.
    • Should complete without errors when successful. (Errors should be real; none of this “ok, expect errors from the script and ignore them.” That’s just dangerous.)
    • Take whatever is there on the server, within reason, and change it to what the script says should be there. This is a declarative approach a bit like Picard’s orders to Number 1. Make It So. If an object is missing, then create it. If it’s there already, then move on. If it needs modification, then change it.

    The hard part about IF NOT EXISTS is that you need all these weird snippets of code to check system views for objects in the database. These are repetitive, but hard to figure out from scratch, and who has time for that? Well, I’m making time. I have had this post idea in the back of my head for a long time. This page will be a clearing house of code snippets to check for the existence of objects, and I’ll come back and add or adjust it in the future. You should be able to just snag what you need for that ugly deployment script and make a few edits.

    Use SQLCMD to Stop on Error and for Script Variables

    /* Optional but Recommended: use SQLCMD to set database name and stop on errors */
    
    :on error exit
    :setvar dbname myDatabase
    
    

    Notes: SQLCMD is a great addition to deployment scripts that allows one to avoid hard-coding things like database names, and to stop script execution if an error is encountered. You have to enable SQLCMD mode in SSMS to take advantage of this. Query > SQLCMD Mode

    Every Script Needs Database Context

    Somewhere you must direct the deployment script to use the correct database. If you are doing that from a calling function like a PowerShell script, it could happen there. If not, your script needs:

    USE [$(dbname)]

    This should happen one time, near the top, after SQLCMD variable assignments and comments.

    Notes: without this, chances are people will accidentally fill master with junk. Truly.

    Create a Database if it Doesn’t Exist

    /* Create the database only if it's not already present */
    USE master
    
    IF NOT EXISTS ( SELECT name FROM sys.databases WHERE name = '$(dbname)' )
    BEGIN
        CREATE DATABASE [$(dbname)] ;
        ALTER DATABASE  [$(dbname)] MODIFY FILE
        ( NAME = N'$(dbname)', SIZE = 200MB, MAXSIZE = UNLIMITED, FILEGROWTH = 200MB ) ;
        ALTER DATABASE  [$(dbname)] MODIFY FILE
        ( NAME = N'$(dbname)_log', SIZE = 50MB, MAXSIZE = 2048GB, FILEGROWTH = 50MB ) ;
    END
    GO
    
    /* Set any important database-level options */
    ALTER DATABASE [$(dbname)] SET RECOVERY { FULL or SIMPLE } ;
    ALTER DATABASE [$(dbname)] SET AUTO_SHRINK OFF ;
    ALTER DATABASE [$(dbname)] SET AUTO_UPDATE_STATISTICS ON ;
    ALTER DATABASE [$(dbname)] SET READ_COMMITTED_SNAPSHOT { ON or OFF } ;
    ALTER DATABASE [$(dbname)] SET ALLOW_SNAPSHOT_ISOLATION { ON or OFF } ;
    /* ... etc ... */
    

    Notes:

    Edit here 7 Sept 2013 based on Aaron’s comments:

    This snippet works for small, simple databases. Edit the file size and growth settings to appropriate values, but please don’t use the truly awful default values from SQL Server. Avoid relying only on auto grow to size your databases, instead, when possible, allocating a plausible size and growth increment for files. If you can’t predict the size, let the system run for a time, and monitor, until it reaches a steady state for file size. If you control the model DB (if you’re creating in-house databases, for example) then you can adjust that. If you are deploying on unknown servers, then you can’t rely on reasonable defaults. That said, auto growing at a suitable increment is a reasonable fail-safe to prevent running out of room with a hard size limit.

    This method has the advantage of using the default locations for files at the server level instead of hard-coding file paths – especially the cringe-worthy
    N'C:\Program Files\Microsoft SQL Server\…

    If your database requires more files and file groups, you might need to provide a file path, but ideally use SQLCMD variables so that it can be easily repointed.

    Create a Table if it Doesn’t Exist

    Preferred:

    IF NOT EXISTS ( 
        SELECT * FROM sys.tables t
        INNER JOIN sys.schemas s on t.schema_id = s.schema_id
        WHERE s.name = 'dbo' and t.name = 'myTable' 
    )
        CREATE TABLE dbo.myTable (
            col1 int not null,
            col2 nvarchar(50) not null
        );

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.objects 
        WHERE object_id = OBJECT_ID(N'[dbo].[myTable]') AND type in (N'U')
    )
        CREATE TABLE dbo.myTable (
            col1 int not null,
            col2 nvarchar(50) not null
        );
    

    Notes:

    Don’t forget the schema. Use two-part names everywhere unless there is a real and compelling requirement not to.

    Management Studio can help generate these IF NOT EXISTS clauses, but the quality varies and some of the code behind that scripting is very old. Still, it can give you a working starting point. In SSMS, go to Tools > Options > SQL Server Object Explorer > Scripting and look for the setting “Check for Object Existence” or “Include IF NOT EXISTS clause,” then script out some objects with the GUI and examine what you get back. Except for procedures, these are sort of OK.

    Create a View if it Doesn’t Exist

    Preferred:

    IF NOT EXISTS (
        SELECT * FROM sys.views v
        INNER JOIN sys.schemas s on v.schema_id = s.schema_id
        WHERE s.name = 'dbo' and v.name = 'myView'
    )
        EXEC sp_executesql @statement = N'CREATE VIEW dbo.myView AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER VIEW dbo.myView AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.views 
        WHERE object_id = OBJECT_ID(N'[dbo].[myView]')
    )
        EXEC sp_executesql @statement = N'CREATE VIEW dbo.myView AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER VIEW dbo.myView AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Notes:

    Don’t forget the schema.

    This looks odd at first, but is wonderful to use. It mimics the “CREATE OR ALTER” feature in some other dialects of SQL.

    Create a Procedure if it Doesn’t Exist

    Preferred:

    IF NOT EXISTS (
        SELECT * FROM sys.procedures p
        INNER JOIN sys.schemas s on p.schema_id = s.schema_id
        WHERE s.name = 'dbo' and p.name = 'myProc'
    )
        EXEC sp_executesql @statement = N'CREATE PROCEDURE dbo.myProc AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER PROCEDURE dbo.myProc AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.objects 
        WHERE object_id = OBJECT_ID(N'[dbo].[myProc]') AND type in (N'P', N'PC')
    )
        EXEC sp_executesql @statement = N'CREATE PROCEDURE dbo.myProc AS SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER PROCEDURE dbo.myProc AS
        SELECT col1 
        FROM dbo.myTable ;
    GO

    Notes:

    Don’t forget the schema.

    Like views, this rather strange one imitates “CREATE OR ALTER.” That allows you to keep one source file, in source control, for example, and update it without changing from CREATE to ALTER or even worrying about it.

    Create a Function if it Doesn’t Exist

    IF NOT EXISTS (
        SELECT * FROM sys.objects 
        WHERE object_id = OBJECT_ID(N'[dbo].[getFunky]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT')
    )
        EXEC sp_executesql 
        @statement = N'CREATE FUNCTION dbo.getFunky () RETURNS TABLE AS RETURN SELECT ''Placeholder'' AS Placeholder' ;
    GO
    ALTER FUNCTION dbo.getFunky ( @someParameter int )
    RETURNS TABLE
    AS
        RETURN
            SELECT col1 
            FROM dbo.myTable
            WHERE col1 = @someParameter ;
    GO
    

    If a Primary Key is Missing, Add it

    IF NOT EXISTS (
        SELECT * FROM sys.tables t
        INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
        INNER JOIN sys.indexes i on i.object_id = t.object_id
        WHERE i.is_primary_key = 1 
        AND s.name = 'dbo' AND t.name = 'myTable'
    ) 
        ALTER TABLE dbo.myTable
        ADD CONSTRAINT PK_myTable PRIMARY KEY /* { maybe CLUSTERED } */ (
            col1
        ) WITH ( ONLINE = ON /* or OFF */ )
        /* perhaps ON [myFileGroup] */ ;
    

    Notes:

    This snippet looks for any primary key at all. That’s because you often cannot trust the name of the primary key, if it was auto-generated by the server at some point in the past. It’s good practice to name constraints, but not everyone does. I actually have some code that says “if the primary key is present but not named {x} then sp_rename it to {x}.” But that’s a little DBA-nuts.

    If a NonClustered Index is Missing (by Name), Add it

    Preferred:

    IF NOT EXISTS (
        SELECT * FROM sys.tables t
        INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
        INNER JOIN sys.indexes i on i.object_id = t.object_id
        WHERE s.name = 'dbo' AND t.name = 'myTable' and i.name = 'myNCI'
    ) 
        CREATE NONCLUSTERED INDEX myNCI 
        on dbo.myTable ( 
            col2 
        ) WITH ( ONLINE = ON /* or OFF */ ) ;

    Also works:

    IF NOT EXISTS (
        SELECT * FROM sys.indexes 
        WHERE object_id = OBJECT_ID(N'[dbo].[myTable]') AND name = N'myNCI'
    )
        CREATE NONCLUSTERED INDEX myNCI 
        on dbo.myTable ( 
            col2 
        ) WITH ( ONLINE = ON /* or OFF */ ) ;
    GO
    

    Notes: this assumes that you trust the name of the target index, and that if present the structure of that index is assumed to be correct. It doesn’t check the columns in the index.

    Conclusion

    I do wish SSMS had better and more sophisticated support for scripting these clauses, but, lacking that, I resort to a small library of these and look them up. I’ll add to this page as I locate more of these – but contributions are certainly welcome!

    * Some people and organizations use various tools to help automate this, but my real-world experience tells me that old-fashioned scripts are still a huge proportion of database installation and patch procedures.

  • SQL Server File Layout Viewer 1.1 Release

    Edit 9 October 2013: A new version is out! Please download 1.2 from here

    Back in the Spring I released a 1.0 version of a utility that will show the structure of SQL Server files graphically. Today I have a new release that includes some fixes and updates based on user feedback, bug reports, and suggestions.

    In this release:

    1. The database scan is significantly faster because it now uses parallelism.
    2. Larger databases are supported, and files over 4GB are now read correctly – but don’t go too crazy. Really large data sets are not likely to work.
    3. SQL Server Auth is now available.
    4. Various bug fixes around pages that contained data at one time but are not currently allocated, and pages “captured” from other databases via instant file initialization.

    Fundamentals about the app are the same:

    1. The app will read a database and then present a color-coded visualization of all the data by object and by page type.
    2. Mouse over the visualization to get details about the pages in a panel on the right edge of the app window.
    3. Use this in test, not production. It’ll scan the whole database, which probably would be performance impacting.
    4. This is designed to study and learn about storage internals using modest data sets. Running it on a large warehouse, for example, is probably not going to work.

     

    SFLV_1_1Screenshot

     

    I want to thank those in the SQL community who helped with this project via Twitter, testing for me, bug reports and critiques. You know who you are, and y’all rock :-).

  • STOP! Wrong Server!

    Some time ago I wrote a post about old-time T-SQL flow control. Part of the idea was to illustrate how to make scripts abort on errors, which is surprisingly convoluted in T-SQL. Today I have a more specific example: how can I check that we are really about to execute in the right database on the right server, and halt if not. Perhaps you’ve connected to the wrong server, or the database you intend to modify doesn’t have the name your script expected. “USE database” is wonderful, but what if it didn’t work?

    As in the old article, there are two problems: 1. How do I stop inside the batch that is currently running (i.e. between “GO” batch terminators). 2. How do I stop the client from merrily sending the rest of the batches in a whole script to the SQL Server.

    There are a number of ways to do this, but there is one method I have found to be clear:

    1. Run in SQLCMD mode, in order to take advantage of :ON ERROR EXIT. This stops the client from transmitting more SQL batches after a batch has thrown an error.
    2. Wrap a check for the server name and database name in a TRY/CATCH at the top of the script. For most practical purposes this can be the first batch, only, since we are planning to halt if the test fails. For versions 2005 to 2008 R2, this has to use RAISERROR, but for 2012 (only) we can finally take advantage of THROW.
    :ON ERROR EXIT
    
    BEGIN TRY
        IF ( SELECT @@SERVERNAME ) != 'TheRightServer' RAISERROR( 'WHOA! WRONG SERVER!', 16, 1 );
        USE [master]
        IF ( SELECT DB_NAME() ) != 'TheRightDB' RAISERROR( 'WHOA! WRONG DATABASE!', 16, 1 );
        PRINT 'Rest of First Batch Ran';
    END TRY
    BEGIN CATCH
        IF ERROR_MESSAGE() = 'WHOA! WRONG SERVER!' RAISERROR( 'WHOA! Caught: WRONG SERVER!', 16, 1 );
        ELSE IF ERROR_MESSAGE() = 'WHOA! WRONG DATABASE!' RAISERROR( 'WHOA! Caught: WRONG DATABASE!', 16, 1 );
        ELSE RAISERROR( 'An exception was thrown in the first batch', 16, 1 );
    END CATCH
    GO
    
    PRINT 'Second Batch Ran';
    GO
    

    Breaking this down – if I am in the wrong server or the wrong database, what I want to have happen is that the statement

    PRINT ‘Rest of First Batch Ran’ ;

    should not execute. The way to accomplish that is to raise an error if we are in the wrong server or database, and use the TRY/CATCH to transfer execution down to the CATCH block. A sysadmin could raise a more severe error that would stop batch execution, but that’s less friendly, and we can’t assume we are all sysadmins. To me this method needs to work with error severity 16, for a typical user.

    Further, it’s important that remaining batches in the whole script are not executed. (Remember, batches are separated by “GO” and are each like a mini program sent from the client to the server.) That means that

    PRINT ‘Second Batch Ran’;

    should not execute if the first batch threw an error.

    The only practical way to do that is with the

    :ON ERROR EXIT

    directive, but that only works in SQLCMD mode.

    For SQL Server 2012, and in the future “and higher,” this example can be simplified using THROW, as:

    :ON ERROR EXIT
    
    BEGIN TRY
        IF ( SELECT @@SERVERNAME ) != 'TheRightServer' THROW 50000, 'WHOA! WRONG SERVER', 1 ;
        USE [master]
        IF ( SELECT DB_NAME() ) != 'TheRightDB' THROW 50000, 'WHOA! WRONG DATABASE!', 1 ;
        PRINT 'Rest of First Batch Ran';
    END TRY
    BEGIN CATCH
        THROW;
    END CATCH
    GO
    
    PRINT 'Second Batch Ran';
    GO
    

    Safe computing!

  • Quick and Dirty PowerShell SQL Server Load Test

    Today I wanted to share a very short PowerShell script that can be used to simulate running multiple clients against a SQL Server, for simple performance or hardware testing. One thing I love about PowerShell is that so many functions and features are pre-built and easily accessible that I have to write very little code to accomplish a lot. This script uses the background job feature to run asynchronous processes, and the SQL features to invoke scripts stored in files, and I didn’t have to engineer any of that, which is cool. It is out-of-the-box PowerShell 2 and SQL Server 2012 at the client, but would work against basically any version of SQL Server at the server side. Not bleeding edge at all.

    # Start  clients running the same SQL script simultaneously
    
    $numWorkers = 10
    
    $startTime = ( Get-Date ).AddMinutes(1)
    
    $jobscript = {
        param( $startTime )
             
        while ( ( Get-Date ) -lt $startTime ) { Start-Sleep -Milliseconds 100 }
        
        Invoke-Sqlcmd -InputFile 'path to your\sqlscript.sql' -ServerInstance yourSQLServer -Database yourDB -QueryTimeout 0  
    }
    
    (1..$numWorkers) | foreach {
        Start-Job $jobscript -ArgumentList $startTime -InitializationScript { Import-Module "sqlps" -DisableNameChecking }
    }

    The script does just one thing: given a SQL script stored in a file, it will start some number of simultaneous connections running the SQL against a server. Let’s break it down a bit to see how.

    First we establish the number of client connections to use, and identify a single start time for them to begin running, one minute from the current time. This extra minute will give all the threads time to get created/configured so that later we can cause them to start at approximately the same moment:

    $numWorkers = 10
    
    $startTime = ( Get-Date ).AddMinutes(1)

    Next, we need a “mini PowerShell script” inside this script to pass to an async job to run. This “mini script” is actually what the multiple worker threads execute. There are a few ways to make this, but a simple way to handle it is just to assign a code block { } to a $variable:

    $jobscript = { work for the async job goes here }

    The background job feature has some behavior that is not completely intuitive – for one thing, variables in your calling script won’t be accessible from a running job, because the job runs in a completely separate session/environment. For that reason, you have to explicitly pass in any values that you need in the code that the job runs. This is the reason for the first line inside the job script:

    param( $startTime )

    Using this parameter, we can pass the single start time value from the outer script to all the background jobs. Without it the jobs could not “see” the value.

    Next, the body of the job script is simply “wait for the start time, then execute this SQL script”:

        while ( ( Get-Date ) -lt $startTime ) { Start-Sleep -Milliseconds 100 }
        
        Invoke-Sqlcmd -InputFile 'path to your\sqlscript.sql' -ServerInstance yourSQLServer -Database yourDB -QueryTimeout 0  

    Note: Invoke-Sqlcmd is not loved by everyone, and it has some limitations, but it’s there and mostly works. If you are on pre-2012 SQL client tools, beware the bug for –QueryTimeout 0 and long-running queries.

    That is all the the code we need to run within the background jobs.

    Lastly, we need to spin those background jobs up. This funny thing generates a list of numbers, 1 to 10:

    (1..$numWorkers)
    That list is piped to a foreach loop, which just drives the loop to run 10 times. For each iteration of the loop, we create a background job set to run the code in $jobscript:
        Start-Job $jobscript -ArgumentList $startTime -InitializationScript { Import-Module "sqlps" -DisableNameChecking }

    This statement again shows some of special behavior of background jobs – first, you have to explicitly pass in any values using the –ArgumentList parameter, because the scope where the job runs cannot “see” your other code or variables. This is how to get the startTime value into all the jobs.

    Second, the background job won’t have your PowerShell profile. In my profile, I load the sqlps module, which is where Invoke-Sqlcmd lives. Since my profile doesn’t load for the job, the job would not have that module available, and would not know what “Invoke-Sqlcmd” is. Fortunately, the Start-Job cmdlet has an –InitializationScript option that allows you to pass a second code block that runs before the real work in the job. This is a good place to set up or configure the job, in this case loading the required module.

    The last item you’ll need for this is the actual SQL script that has your testing logic, like running resource-intensive queries in a loop. I will leave that as an exercise for the reader.

    Once the jobs are running you can obviously watch PerMon or use other diagnostics to see how the subject server is handling the load. Back in PowerShell, you can use the cmdlets Get-Job, Receive-Job, Stop-Job and Remove-Job to manage or view the jobs created above.

    I used this to test a new server yesterday with a PCI flash storage card and drove it to 163,000 real, SQL Server, random read IOs/sec, which was pretty cool. Happy load testing!

  • File Layout Viewer vs. Drop Clustered Index

    I had a very educational exchange the other day on Twitter (also known as “getting schooled”) by my SQL compatriots Argenis Fernandez (@DBArgenis) and Hugo Kornelis (@Hugo_Kornelis). A guy called Chris Switalski (@crswit) asked a really interesting question on #sqlhelp, which I proceeded to answer incorrectly, which led to a correction by my friends, which in turn took me back into the SQL File Layout Viewer, which quickly uncovered a bug there.

    So, learn from your friends on Twitter!

    Anyway, here’s the question, and corrected demo using an updated SQL File Layout Viewer build.

    Question (paraphrasing):

    When you create a clustered index on a heap, the data from the heap is rewritten and ordered, and the upper levels of the index are added. If you drop that clustered index, what happens? Does the set of leaf-level pages stay in place, and the index pages just get unlinked and deallocated? Or is the data rewritten again? Does it stay in the same logical or physical order, even though the ordering requirement is removed?

    I had said that I imagine the data is rewritten in both cases, but that we could look and see to be sure. I was totally wrong – the leaf level of a clustered index is made of data pages, and a heap is made of data pages too, as Argenis pointed out. It turns out that, since they are not different structures down at that level, in order to drop a clustered index, the server will indeed just remove the upper levels of the index and leave all the leaf level data pages in place. Cool optimization! It turns out that my original release of the File Layout Viewer doesn’t always display this situation correctly, so I had to make some changes. This demo, though, I believe shows the behavior correctly.

    Answer:

    First, we make a test database and create a heap, and populate it with some dummy data:

    USE master
    IF EXISTS ( SELECT 1 FROM sys.databases WHERE name = 'HeapTest' ) DROP DATABASE HeapTest;
    CREATE DATABASE HeapTest;
    GO
    
    ALTER DATABASE HeapTest SET RECOVERY SIMPLE;
    GO
    ALTER DATABASE HeapTest MODIFY FILE ( NAME = N'HeapTest', SIZE = 102400KB );
    GO
    ALTER DATABASE HeapTest MODIFY FILE ( NAME = N'HeapTest_log', SIZE = 10240KB );
    GO
    
    USE HeapTest;
    
    CREATE TABLE SampleData ( 
        Data uniqueidentifier DEFAULT NEWID(), 
        Padding char(100) DEFAULT ( CAST( CHECKSUM( NEWID() ) as CHAR(100) ) ) 
    );
    GO
    
    INSERT SampleData DEFAULT VALUES;
    GO 100000

    If we run the File Layout Viewer, we see a nice, contiguous set of data pages (type 1) in teal:

    FLVDropClust01

    Next, we’ll add a NON clustered index, just so we can track what happens to that structure too:

    CREATE NONCLUSTERED INDEX Nonclust ON SampleData ( Padding );
    

    That shows up in the next part of the data file, in bright blue:

    FLVDropClust02

    Now if we make a clustered index on the heap, it will have to be rewritten into another region of the file, so that it can be stored in the right logical order. We should expect it to “move” in the visualization down into the empty part of the file. The non clustered index will also be rewritten, because it will switch from using physical row locations to point to data in the heap to using the clustering key to point to data in the new clustered index. Let’s see what it looks like:

    CREATE CLUSTERED INDEX Clust ON SampleData ( Data );
    

    The new clustered index and new version of the NON clustered index are both created in another region of the data file:

    FLVDropClust03

    You can see, in lighter gray, the regions that were emptied where the two structures had been stored, and the two new structures now in the center of the graphic.

    Now for the part of the question I got wrong: what happens when we drop that clustered index?

    DROP INDEX SampleData.Clust;
    

    This is really neat: the leaf level of the clustered index (remember data pages are data pages) stays right where it is, and the upper level index pages are just deallocated/unlinked.

    The NON clustered index does have to be rebuilt yet again to change those clustered index keys out for row location pointers that point to the rows in the heap, so those pages do get rewritten into yet another region of the file:

    FLVDropClust04

    So, to summarize:

    When you add or remove a clustered index from a table, any NON clustered indexes always have to be rebuilt - and will be automatically - to exchange row pointers for clustering keys in the non clustered index itself. This generally takes care of itself, but it can be a performance issue, or cause longer running transactions or a large amount of logging.

    When you CREATE a clustered index on a heap, the data has to be rewritten to enforce the logical order of the index.

    When you DROP a clustered index, however, the server can keep all the leaf level pages in place and just strip away the index levels above.

  • Telecommuting, Month 9

    Five Things I’ve Learned About Surviving the Telecommute

    Over the past several years my wife and I went through this strange series of events where we swapped work roles – first she worked remotely for a Minnesota company, from our home in Seattle, and then we moved to Saint Paul, Minnesota and I started working remotely for my company back in Seattle. Our story is a lot like what you read about remote work, but it’s different to live it.

    Backstory

    We lived on Vashon Island in Puget Sound, near Seattle, for about eight years. The small company my wife works for relocated from there to Minnesota, but she kept her job, working remotely as part of what became a virtual corporation. They were a very small company, but tech-savvy enough that, while not an IT business, remote work seemed like a natural arrangement. I think that shows just how far that idea has come. Like many U.S. families, we video conference with the Grandmas and Grandpas spread around the country, which she effortlessly applied to work, too. Her little company used “cloud” services without really thinking about that idea as anything new or special. It was just a natural fit. They valued her, she valued them, the tech was pretty easy. It was all win.

    Fast forward a few years and our family was looking to relocate, too. Her company was not really the destination – we needed mainly to be closer to our extended families. “Closer” today often means flight time and number of connections (a.k.a. airport) more than physical proximity. Our families are spread all over the eastern half of the U.S. so moving to one or the other city was something we considered, but wasn’t really the only solution. We have kids, but we are also city people and we both really wanted the culture and amenities of a real city and a good, safe urban environment for our boys. Not too many cities have that combination.

    “How the heck did you pick Saint Paul, MN?” is a question I get all the time. It’s hard to explain, but take the airport, proximity to East Coast, South and Northwest by one direct flight, the truly outstanding elementary school in our new neighborhood, the old Victorian gridiron neighborhoods of Summit Hill/Macalester/Groveland, a good friend from college, no fear of snow, and toss them all together in a geography salad, and that was our destination.

    By freak accident, or fate, it’s also not far from my wife’s workplace.

    Switcheroo

    We moved and we both kept our jobs, which was amazing. My wife is able to go into the office a few days a week now, though it’s a long drive, and I started telecommuting to my job in Seattle from Saint Paul.

    Both of us have experienced the usual pros and cons of telecommuting. By now some of these things are well documented, but I think it’s hard to see how important those issues are until you experience them firsthand, either as a remote employee or working with a remote employee, and over a long period. It’s great, and it’s difficult at the same time.

    I will say my employer has been really wonderful throughout this process – they are supportive, and they had to go through extra effort to make this happen for me, which they certainly didn’t have to do. We had some tentative few-days-a-week sort of telecommuting for a while, but I think I am the first true full time, long distance, different-time-zone sort of remote worker, with all the challenges that come along in that scenario. Because there wasn’t really an established environment used to remote work, with habits and technology and processes in place, we’ve had to work some things out as we go.

    In fact every aspect of it, for me, has proven a double-edged sword, and I wanted to spend some time talking about why.

    Coffee Shop? No.

    First I have to get one myth out of the way: working in a coffee shop is OK -- if you want to work in an uncomfortable, noisy place where you can’t use the phone, the network is dodgy, your screen’s too small, coffee costs $4, and there’s no privacy. It only sounds romantic. Honestly.

    I work from a dedicated space in my house in order to be focused and minimize distractions, and that has worked well. I’d say to even contemplate working from home you need a separate space, with a door you can close. The attic space where I currently work is great in terms of creating a separate work environment in the house. I have a real desk and an IP phone connected straight to the office. The space has some other issues, notably not being a layout where office furnishings, network and power work well, but we are moving to a permanent home a few blocks away, and in purchasing that house the “office space” was one of the main purchase criteria. We should be into the new place in late June.

    On the other hand, working alone in a home office has some real social challenges, as I’ll talk about more below. It’s quiet. Really, really quiet, which can be too much of a good thing.

    Losing the Commute

    I have always – so far, anyway – avoided a typical driving commute. Spending hours on the freeway in a car is so awful, for me, I have always orchestrated living and working so as to avoid it, by living close to my office or to transit. Eliminating the commute altogether is obviously a fantastic perk of telecommuting. I am eternally grateful for the hours I’ve gotten back.

    And yet.

    Here’s the strange thing, which others have talked about too: if you are the type of person for whom remote work will be successful, you probably are the type who has a passion for work, where work isn’t just time spent from eight to five. Passion for work is something that creates the discipline to work consistently without explicit direction, and to produce all on your own, regardless of schedule.

    Those same traits also make the removal of the commute problematic. Life and work can start to merge together, and it isn’t always pretty. Leaving the house, getting in the car/bus/train/ferry and being forced to sit there, even for twenty minutes, is an important psychological boundary between work and life, where you – even subconsciously – leave one and get mentally prepared for the other.

    This is something I really struggle with. I have a tendency to obsess about work anyway, and when it’s always right here, that can be a real problem. It’s particularly bad during the times when work is stressful. Working from home demands that your team evaluate you on your output as much as just your presence, but your perceived presence looks irregular, and that can cut both ways.

    Isolation

    There are huge benefits to being out of the daily fray in the office, mostly in terms of productivity on heads-down work. Interruptions are minimal. I used to use the “Pomodoro” technique in the office, working in 20-25 minute sprints, but it was mostly as a survival technique against the onslaught of meetings, and email, and walk-up requests. I find in my new situation that I don’t have to do that, because finding solid blocks of time to do real work isn’t so much of a problem.

    The other refreshing thing is that one important type of work – thought – is OK. I used to be in a cube, in an open office. One challenge was that some parts of my job require me just to think. Think about strategy. Think about how to design against some complex problem. Read books. When you think, you don’t necessarily look busy. It’s productive time, but you aren’t producing a thing someone can point at. So, time to think through problems without some pretense of tapping at the keyboard is really valuable. In the office I used to block time out and walk on a nearby bike path in order to have this time without needing to look “busy.” Now I can do this in the house, which is great.

    Communication with my team really is a challenge. I have three-times-a-week calls with my team lead, and daily short calls with my team just to check in. It sounds odd, but these are a real requirement just to help offset the lack of hallway conversation that normally takes place in the office. If you are thinking of working remotely, I would advise making the extra effort to schedule time like this, because even though it sounds trivial, it’s really important to keep everyone on the same page.

    The biggest challenge of all is missing on the rest of the hallway or cube/desk decision-making that can go on. I always had my ear to the ground in the office, and viewed part of my DBA role as trying to steer everyone to the right decisions about data and SQL Server, even when it meant heading something off that I overheard in the next cube. For better or worse, that part of my job isn’t possible remotely, and that can be a challenge.

    But I’ll also lay this right out there: it’s lonely. At different times in life we have different social structures to rely on for friendships. There’s high school, college, church -- before kids my wife and I used to have dinner with friends or have the occasional party. If you’re 40+, with kids, work is a main source of social interaction and friends. There aren’t that many other social structures like it. Moving to a new city and essentially breaking that social interaction with my coworkers is tough, and there isn’t much to replace it.

    Social media (and I’m hooked on it) doesn’t fill that void, I’m afraid. It naturally took over more of my free time with this move, but as months go by I realize there’s something depressing about it, and I need to limit the time I spend online.

    Children

    Some things about remote work have helped our family life, but mostly it relates to basics like flexible hours and the time recovered from not spending an hour a day on the road. If you think that working remotely will help with child care expenses, I’d say stop – that’s unlikely. It’s not possible, for me anyway, to actually care for the kids in any reasonable way and work at the same time. There’s no way. I think I could do a poor job with work and a poor job watching them, and that serves nobody well. We have just as many childcare arrangements and expenses as if I drove to work.

    But I do love the little perks. Many mornings I get to eat breakfast with the kids and put the oldest one, who is in second grade, on the bus. My wife and I have lunch dates, usually once a week, on a day we both work from home and the kids are at school.

    On the other hand, there are times when I have to explicitly stop my wife or my kids and just say, “I really can’t do x because I am working. Truly.” And it is easier for them, just because I’m in the house, to ask or stop me, “Honey can you stop and …” Having my wife have work-from-home experience too really helps, but this can be a problem (both directions, I am guilty too).

    For better or worse, when we really have an issue with the family, which sometimes comes down to just not having enough hours in the day, my wife or I can work in the evening to make up time or meet deadlines, and that is definitely a major benefit. It can, though, aggravate the whole problem of not leaving work at work and being free of it at home.

    Tech, Tools and Email

    This is the part all us gadget freaks gravitate to, I think – what tools make work-from-home possible, or make it better. Here the news is also mixed.

    First, I. Love. Email. To a fault.

    Imagine a medium where multiple people can collaborate on a solution and everyone is informed, where the outcome and the trail of decisions is documented in writing automatically, and the participants can all work on it asynchronously, without being interrupted. In a perfect world, it seems like email would almost do that. (I dislike the phone for all the inverse reasons.)

    And yet.

    Working remotely makes it so easy to fall into an email-only mode of communication, and while I love that, it flatly doesn’t work well for some types of activity. For one, I find it almost impossible to persuade some individuals to do something through email. To act. I am very sorry to report that that still often requires a phone call or, ack!, a full-on meeting, with a manager, because email simply goes ignored. Among those that like email, email works, but with others, sometimes not. Fortunately my team lead understands this well, and I know I can ask him to set up that meeting or that call when we have to go that route. Email can also foster miscommunication in some circumstances, especially relating to tone and personal interaction. This email thing is clearly a habit I have to be careful about.

    What about all those other tools? Tele-presence and all the funnily-named collaboration do-dads like Skype, Trello, Dropbox, Toggl and Producteev and Flow and Remember the Milk and Google Hangouts and Asana?

    There is, unfortunately, a sobering reality here. The number and variety of online collaboration tools really has exploded recently with the growth in remote workforce and virtual organizations. But it’s been my experience that a company probably has to be young (in attitude, not necessarily age), nimble, small, has to be engaged in ideas about the modern workforce, willing to try new things, and has to deal with few secrets to contemplate using most of these services. Otherwise the obstacles are many: change, security, habit, ignorance, organizational inertia, isolation, policies, firewalls both real and imagined.

    If you are in a company that is bigger, a company with security concerns, a company that cannot change course quickly, then it seems likely you might get the right to work from home but not get access to any of these cool new tools.

    So, I have to say, in my case I have Lync (barely) for desk or app-sharing, and an IT group that is just now getting the hang of that, but no video, no tele-presence, no novel SaaS online project management whiz-bang anything, funny names or no. And I suspect that the work-from-home crowd may fall into two groups – those with the new tools in small companies or start-ups, and those in established companies like me trying to make it work with just RDP and the phone.

    I don’t think this is related to my company’s commitment to my success working remotely, or to their support of it – it’s simply too hard to overcome the organizational inertia, which will be true in many organizations.

    But with some good processes and habits, I think the minimal-tools predicament can still work. Process and habit, I believe, are probably 85% of the battle, and while coolio tools might aid that, I don’t imagine they are a substitute for it.

    I do get jealous of my wife on this front, though, because a small company on a shoestring budget can do more with cloud-based applications like this than a bigger, well-capitalized one. Ironic.

    Conclusions?

    I think of this as an ongoing experiment. I love my job as a DBA – though, to be sure, every job comes with some challenges – and in many ways it’s suited to remote work. I am finding it harder than I imagined, mostly because it’s such a solitary existence, and because the lines between work and home are so blurred as to be almost erased. I have to keep changing techniques and trying different things to make it work well.

    It’s become clear that I should not wait for the next “teh aw3some” tool to come around, and instead focus on basics like process, communication, and habit to make the fundamental tools of RDP, phone and airplane work to their fullest.

     

    P.S. I am grateful to SQL community friends Tom LaRock, Mike Walsh, and Aaron Bertrand for part of this remote work adventure.

    Edit 23 April: A friend of mine reminded me how important it is to go back into the office and talk with people face to face. I thought I should add, since I didn't make this clear, that I do go back for three days or so every other month. That's vital to this whole arrangement. I try to pack those visits with meetings just so I can talk with people and get reacquainted. 

     

  • Speaking in Chicago Saturday

    I’m excited to be offering a beginning PowerShell session at SQL Saturday #211 in Chicago on Saturday, April 13. This time we’re making it a family weekend, bringing our two boys. I haven’t been to Chicago for many years, and it’s, of course, an Architecture dream world, so it should be fun to look at some buildings again!

    Emil_Bach_House

    Emil Bach House, photo © Jeremy Atherton, 2006

    The session is intended to help you get started with some PowerShell syntax basics. I try to demystify the Pipeline a bit, help make sense out of all the { curlies } and [ brackets ] and ( parentheses ), and translate some of what you already know about T-SQL into PowerShell. If PowerShell seems cryptic or mysterious, come on down and maybe we can sort it out together.

    Should be a great time.

  • SSMS 2012 Restore GUI Gotcha

    Today I want to bring to your attention an issue in the SQL Server Management Studio 2012 restore GUI. In many ways the new restore dialog is nicer than the old one, with new features and added convenience – but, as is always the Achilles heel of GUI tools like this, if you don’t know what’s really going on it can bite you. I’m not sure what to call this issue, maybe just a UI design flaw. Technically it works as designed, and there’s nothing really wrong with it, so it’s not a bug. But I can imagine it really causing someone pain who is careless or doesn’t know what’s happening behind the scenes.

    Restoring a copy of a database make take the original down.

    Among the new features is some management of “Tail-Log Backups,” which is a wonderful idea. Taking a backup of the tail of the log will take down the database and put the whole end of the log into a backup file, preserving basically all modifications to the DB from the log. This is ideal, for example, at the last step of migrating a database using log shipping, because all the activity is preserved and the source database is locked against further modifications. The log chain is preserved in moving to the new copy of the database, or in a DR scenario, to your other server.

    The problem is, I know a lot of people don’t know this. Often its people who use the GUI a lot, such as “accidental DBAs.” I think the GUI is really important despite those who rarely use it, or frown on it.

    Here’s the issue: Suppose I have a production database that has some user-created problem (like a user accidentally updated some rows.) It may be reasonable to restore a copy of the database to a point before the issue and investigate whether it’s possible to merge the data back in. The original database isn’t damaged from a technical point of view – there’s no corruption, for example.

    1. Right-click the original source database and choose Tasks > Restore > Database … from the context menu.
    2. The resulting dialog will go out and discover the backup files for that database from MSDB, which is very helpful.
    3. Change the name of the target database, in order to restore it as a copy.

    RestoreGUIIssueCap1

    There’s a note at the top of the dialog that, while accurate, may not help some in this scenario. It says, “A tail-log backup of the source database will be taken.” Sounds innocuous if you don’t know what that implies. In past versions, restoring a copy of a database would never affect the original – or not by default, anyway – so I cringe as I imagine people merrily clicking past this warning.

    The script you get with these settings is indeed composed with a tail-log backup of the source database:

    RestoreGUIIssueCap2

    That will, as indicated, take a tail-log backup – and thereby take the original database down. Nice if you wanted that, but a mean surprise if you didn’t.

    If you act on the warning and click the Options tab, and then uncheck the offending setting, you do get the expected behavior (the original database is unaffected because the tail-log backup is not included at the top of the script):

    RestoreGUIIssueCap3

    So, be careful out there!

  • T-SQL Tuesday #040: Files, Filegroups and Visualizing Interleaved Objects

    Early in my career as a DBA, I have to admit I didn’t quite “get” what all the fuss was about with the multiple file and multiple filegroup capability in SQL Server. Over the years, though, as I learned more about partitioning, backup strategies for large databases and, most importantly, storage internals I’ve started to catch on. For today’s T-SQL Tuesday, I thought I would share an early lesson, using the newly released SQL File Layout Viewer utility available here.

    tsql2sday

    Adam Machanic started T-SQL Tuesday several years ago and it’s been a great way to motivate bloggers and readers to share ideas around a topic. This month’s even is hosted by Jen McCown / MidnightDBA. An awesome idea, as ever.

    What I hope to illustrate today is a simple but vital concept about files and file groups: files, on their own, use a proportional fill algorithm in SQL Server and will receive new data from any object that needs more space. If an object needs additional storage, an allocation will be made in an empty area of whichever file has the most space available. Under the multi-file feature alone, there is no attempt whatsoever to separate objects in the database logically into different files. The result is that objects can appear to “round robin” through the files as more space is allocated – it’s not really a round-robin algorithm, but if the files are all about equally full, it looks similar.

    Filegroups, on the other hand, allow logically grouping objects in a way that specifically directs them into separate files, and so controls the physical layout of those objects on disk. Using filegroups it’s possible to direct an object into a file (or set of files) and physically separate it from the rest of the database.

    Because of this distinction, it’s been my experience that filegroups are far and away the more useful of the two features, and if I have a database that merits splitting up, I go straight for filegroups, with one physical file per group. (There are some edge case exceptions like TempDB allocation contention that are outside the scope of this post.)

    Interleaving

    There are a few reasons we’d want to physically separate objects, but, because it so clearly illustrates the difference, I want to show interleaving today. I made a demo database using the following script:

    USE [master]
    GO
    
    :setvar datapath "C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2008R2\MSSQL\DATA"
    :setvar logpath "C:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2008R2\MSSQL\DATA"
    
    CREATE DATABASE [VizDemo3] ON  PRIMARY 
    ( 
        NAME = N'VizDemo3', FILENAME = N'$(datapath)\VizDemo3.mdf' , 
        SIZE = 25600KB , MAXSIZE = UNLIMITED, FILEGROWTH = 51200KB 
    ), 
    FILEGROUP [MultiFileFG] 
    ( 
        NAME = N'VizDemoMFGF1', FILENAME = N'$(datapath)\VizDemoMFGF1.ndf' , 
        SIZE = 25600KB , MAXSIZE = UNLIMITED, FILEGROWTH = 51200KB 
    ), 
    ( 
        NAME = N'VizDemoMFGF2', FILENAME = N'$(datapath)\VizDemoMFGF2.ndf' , 
        SIZE = 25600KB , MAXSIZE = UNLIMITED, FILEGROWTH = 51200KB 
    ), 
    ( 
        NAME = N'VizDemoMFGF3', FILENAME = N'$(datapath)\VizDemoMFGF3.ndf' , 
        SIZE = 25600KB , MAXSIZE = UNLIMITED, FILEGROWTH = 51200KB 
    ), 
    FILEGROUP [SingleFileFG1] 
    ( 
        NAME = N'VizDemoSFG1', FILENAME = N'$(datapath)\VizDemoSFG1.ndf' , 
        SIZE = 25600KB , MAXSIZE = UNLIMITED, FILEGROWTH = 51200KB 
    ), 
    FILEGROUP [SingleFileFG2] 
    ( 
        NAME = N'VizDemoSFG2', FILENAME = N'$(datapath)\VizDemoSFG2.ndf' , 
        SIZE = 25600KB , MAXSIZE = UNLIMITED, FILEGROWTH = 51200KB 
    )
    LOG ON 
    ( 
        NAME = N'VizDemo3_log', FILENAME = N'$(logpath)\VizDemo3_log.ldf' , 
        SIZE = 10240KB , MAXSIZE = 2048GB , FILEGROWTH = 10240KB 
    )
    GO
    
    ALTER DATABASE [VizDemo3] SET RECOVERY SIMPLE 
    GO
    
    USE VizDemo3
    GO
    
    CREATE TABLE dbo.SampleCustomers_on_Primary  ( 
        id int identity( 1, 1 ) NOT NULL PRIMARY KEY CLUSTERED, 
        buncha char(500) default 'A', 
        big char(500) default 'B', 
        vals char(500)  default 'C'
    ) ON [PRIMARY] ;
    GO
    
    CREATE TABLE dbo.SampleOrders_on_Primary  ( 
        id int identity( 1, 1 ) NOT NULL PRIMARY KEY CLUSTERED, 
        buncha char(500) default 'A', 
        big char(500) default 'B', 
        vals char(500)  default 'C'
    ) ON [PRIMARY] ;
    GO
    
    INSERT dbo.SampleCustomers_on_Primary DEFAULT VALUES;
    INSERT dbo.SampleOrders_on_Primary DEFAULT VALUES;
    GO 20000
    
    
    CREATE TABLE dbo.SampleCustomers_on_Files  ( 
        id int identity( 1, 1 ) NOT NULL PRIMARY KEY CLUSTERED, 
        buncha char(500) default 'A', 
        big char(500) default 'B', 
        vals char(500)  default 'C'
    ) ON MultiFileFG ;
    GO
    
    CREATE TABLE dbo.SampleOrders_on_Files  ( 
        id int identity( 1, 1 ) NOT NULL PRIMARY KEY CLUSTERED, 
        buncha char(500) default 'A', 
        big char(500) default 'B', 
        vals char(500)  default 'C'
    ) ON MultiFileFG ;
    GO
    
    INSERT dbo.SampleCustomers_on_Files DEFAULT VALUES;
    INSERT dbo.SampleOrders_on_Files DEFAULT VALUES;
    GO 20000
    
    
    CREATE TABLE dbo.SampleCustomers_on_FileGroups  ( 
        id int identity( 1, 1 ) NOT NULL PRIMARY KEY CLUSTERED, 
        buncha char(500) default 'A', 
        big char(500) default 'B', 
        vals char(500)  default 'C'
    ) ON SingleFileFG1 ;
    GO
    
    CREATE TABLE dbo.SampleOrders_on_FileGroups  ( 
        id int identity( 1, 1 ) NOT NULL PRIMARY KEY CLUSTERED, 
        buncha char(500) default 'A', 
        big char(500) default 'B', 
        vals char(500)  default 'C'
    ) ON SingleFileFG2 ;
    GO
    
    INSERT dbo.SampleCustomers_on_FileGroups DEFAULT VALUES;
    INSERT dbo.SampleOrders_on_FileGroups DEFAULT VALUES;
    GO 20000

    This is a really simple example showing several possible file layouts, and what those do with the physical storage of the data. Starting from the top, note that the database create statement makes three different file setups:

    1. Our old friend Primary – one MDF file to rule them all!
    2. A file group with three physical files: MultiFileFG. These files will be populated with SQL Server’s proportional fill.
    3. Two file groups with one physical file apiece: SingleFileFG1 and SingleFileFG2. This allows the DBA to direct objects into specific files on disk.

    Next I make some pairs of sample tables, and locate a pair in Primary, a pair in MultiFileFG, and a pair (one table each) in SingleFileFG1 and 2. The tables are each populated with 20,000 rows of sample data, arbitrarily called Customers and Orders.

    It’s important to note that the tables are filled with data in an alternating fashion – one table gets a row, then the other table gets a row – and that means that the allocations for space for those objects will also alternate. One table grows, then the other, then the first again.

    If the tables are in the same filegroup, even with separate physical files, the data will be interleaved. There will be a block of pages for one table allocated, then from the other table. Under default settings in SQL Server (not using the –E parameter, for example) the blocks of pages will each be a 64k extent of 8 data pages.

    So, let’s have a look at what happens inside the files, and see if it supports this logic.

    One File

    InterleaveScreenCap1

    My goodness it does! But you probably suspected that, as I’m the guy writing this post.

    First let’s look at the blue area – the stack-of-bricks looking light blue and purple region is the Primary filegroup / MDF file. What we are seeing is the alternating, interleaved extents from the two tables, one in blue and one in lavender/purple. This isn’t terrible, but there are two things worth noting:

    1. There’s a decent amount of fragmentation. Not harmful, necessarily, but these two tables definitely are not contiguous. If we have an application that wants to do a lot of scans, and we have spinning media, we might have some performance degradation. Read-ahead probably doesn’t work. This might not be ideal for a data warehouse.
    2. If we re-index one of those objects the other one will remain full of “holes.” We could re-index both, but it might take a few passes before this interleaving works itself out.

    I want to emphasize here that this isn’t necessarily a terrible situation, but it’s illuminating as far as how the data is physically written out.

    Three Files in a Group

    The next reasonable question to ask is whether making multiple files, alone, might solve this issue. So the next section of the visualizer, scrolling down a bit, shows the one filegroup/multiple file scenario from the demo. This is filegroup MultiFileFG, shown in alternating pink and purple:

    InterleaveScreenCap2

    Notice, we got three files, but the two tables we map into them are still interleaved, now across all three files. (Each file in the visualizer has a signature of orange-gray-orage system pages right at the beginning.)

    This pattern is because of the proportional fill applied to the physical files inside one filegroup. An extent is allocated for an object from one file, then the next file has the most available space, so the next file used for the next extent, and so on. This means our tables are still cut up into 8-page extents spread across the physical media.

    Two Groups with One File Each

    The last section of the demo writes each table into a separate filegroup, where each filegroup is backed by a single physical file. The results here are represented by the solid pink area in the graphic:

    InterleaveScreenCap3

    In this case, the alternating inserts in the script don’t result in alternating allocations inside the files, because one table is specifically isolated to one filegroup and the other table to the other. In this case, the data in the files is practically contiguous.

    Conclusion

    Does this mean you should go out and make 1,000 file groups, one for every table or index? Please don’t! There’s a balance when planning files and filegroups between manageability, imagined performance impact of fragmentation (or real), complexity, and many other factors. SQL Server actually does a fine job, on smaller databases, working with one or a handful of files. But if you are planning a larger warehouse project, or you find the hardware you have struggles with the volume of data you need to manage, or you are looking at partitioning, then understanding how these storage internals behave, along with many other design considerations, will help.

  • Presenting to East Iowa I-380 U.G. March 12

    Tomorrow I’m making the beautiful drive down from Minneapolis into Iowa to present to the I-180 / East Iowa SQL Server Users Group. I’ll be talking about managing lots and lots of unruly databases, and presenting a walkthrough of this maintenance solution. If you’re in the area please come check it out!

  • March Update to Rules-Driven Maintenance

    This month I have a minor update to the Rules-Driven Maintenance code I originally posted back in August 2012. This update has just two enhancements, but they are nice ones, I think:

    1. Much improved handling for instances that use database snapshots.
    2. Ability to do intra-day differential backups purely by setting maintenance times and limits in the policy table.

    The code posted here is cumulative, and replaces entirely the original code, but please refer back to that original blog post for description, instructions and details. This update can be installed right over an existing deployment, or be installed all by itself as a complete solution but – as always – please test and be cautious.

    Enjoy!

    Note: I've posted updates to this solution. Please get the latest version here.

  • Trigger Authoring Decision Tree

    TriggersDecisionTree

  • Public Release, SQL Server File Layout Viewer

    Version 1.0 is Now Available!

    Edit 9 October 2013: A new version is out! Please download 1.2 from here. 

    I’ve been working off and on, as my real job permits, on this visualization tool for SQL Server data files. This is an educational or exploratory tool where you can more readily see how the individual data pages in MDF/NDF files are organized, where your tables and indexes live, what effect operations like index rebuild or index reorganize have on the physical layout of the data pages.

    FileLayoutViewerR1

    The viewer will scan a whole database, using only SQL and DBCC commands, and will render a color-coded representation of all the data pages represented in colored bands. Each partition of each index or heap in the database is assigned a color, so that you can see where all the bits and pieces of an object are located in the files. Above the colored bands there are grayscale or white pixels that show the page type in SQL Server (most are white, which are data pages. Unused/empty regions of the file show as gray). In the image above, for example, all the bright green areas are one index, all the purple areas are one index, and so on.

    There is mouse-over functionality. If you move the mouse cursor over the graph, then details about each page populate the text fields at right, including the object and index the page belongs to, the page type, whether the page represents a fragment, where the previous and next pages are for the same object, etc.

    Why?

    Why create something like this? I am a visual person, and I have a theory that many issues we have in computing come down to not being able to see what’s going on. This is especially true as we learn about unfamiliar technology – we have to develop a mental model of structures like B-trees or linked lists or files in order to understand what’s happening. I hope this tool, combined with other knowledge, will help people form an accurate understanding of how data file internals work in SQL Server, faster than working purely in the abstract with tools like DBCC Page or DBCC Ind.

    Instructions

    1. Download the tool and unzip it. The package includes both an executable and the source code. If you don’t want the source, the .exe file is a standalone program and will run all on its own, so you are welcome to discard the source folder.
    2. Validate you have the required prerequisites from the Prereq’s section below.
    3. Locate a non-production/test database to analyze. The database can be local or on a remote server. I suggest something of a reasonable size, because scanning a really huge data set can take quite a long time.
    4. Run SQLFileLayoutViewer.exe and select a database to scan. If the database is on a remote server, type the SQL Server name/instance name into the dialog.
    5. Click Analyze.
    6. Examine the resulting graph, and mouse over it with the cursor to view detailed information about each page.

    Disclaimer

    This is a freeware tool provided for your fun, education and entertainment. However, there is no warranty of any kind and you use it at your sole risk. The tool is free but offered under the GNU General Public License 3. If successful, and people are interested, I’ll move this work to some sort of open source project.

    Prerequisites

    The app requires .NET Framework 4.0 and the SQL Server management tools. I’ve tested it on Windows 7, Windows Server 2008 R2 and Windows 8. It can be run against a database on a local or remote SQL instance. I believe it will work on any database in SQL Server 2005 or later, but have not tested every possible scenario.

    Risks?

    I believe this tool to be relatively risk free, but I would avoid running it against live production data. The tool’s data collection is simple: it will issue a few system table selects to get things like object names, and then it will execute a DBCC PAGE statement against every page in the database. All other processing after that is done locally in the application itself. It does not modify the database.

    Bugs?

    I would love to hear about bugs you come across, or additional features you think would be valuable. Please contact me through this site. Note that I am a DBA first, and an amateur .NET developer a distant second, so please be gentle.

    Enjoy!

  • Presenting for PASSMN User Group Feb. 19

    I’m pleased to be presenting the talk “Shoveling Frogs into a Wheelbarrow” for my new hometown user group PASSMN on February 19, 2013. Come on down if you’re in the Twin Cities.

    This talk is close to my heart, as it shows how I grew from a younger, greener DBA responsible for one or two systems into a terrified overwhelmed more mature DBA facing 2000+ databases alone. I hope this will be a highly interactive session, as I am sure there will be a mix of people in the audience with more or less experience than I have, and more or less complex environments to manage.

    <Ribbit.>

  • OT Le Casque Zik de Parrot Totally Biased Review (Zik Headphones)

    I’m not a complete gadget freak, but sometimes a widget draws me in and pushes that gadget-lust button. Such was the case last year when Philippe Starck and the French bluetooth device maker Parrot announced a collaboration to make what are probably the coolest designer headphones anywhere:

    http://www.journaldugeek.com/forum/topic/9825-news-un-prix-et-une-date-pour-le-casque-zik-de-parrot/

    I never thought I would own a pair, at $400, but my darling, awesome, rock star wife Denise bought me some for Christmas. I thought I should do some kind of objective review now that I have had them for a month, but I am so in love with these that that’s clearly impossible. So, instead, here’s my completely biased review/tour of Le Zik. I’m posting it here just because I hope my SQL peeps are also travellers and also love tech like this.

    This is the kit I travel with:

    P1090832

    I have a Lenovo Thinkpad x220 tablet with Windows 8, top. It has a touch screen that you can twist and fold to use the device as a tablet. Size-wise it’s smaller than a laptop and bigger than a tablet. On a plane it is a thing of beauty – fully functional computer for work, tablet for reading, movies or music without carrying an extra device. Across the middle is what you get in the box with Parrot Zik. At the bottom is my no-frills, cheap Android phone.

    Zik looks like this:

    P1090835P1090836P1090837

    The headphones feel very solidly constructed but not heavy. The headband is connected to the ear pieces with gorgeous sculpted aluminum “arms,” with orange accents at the point where they disappear into the top band. The aluminum pieces are articulated so that the unit folds flat to store or carry, and for automatic fit when they are on. I am a bit on the small side, but they fit very comfortably for several hours at a time. The only adjustment is that those arms extend out farther from the headband for bigger heads.

    The controls, ports and the mics for noise cancellation are in a coordinating aluminum inset on the bottom of the ear pieces:

    P1090838

    Headphones are generally mundane, occasionally interesting from a design point of view, but Starck really manages to elevate them – these are really nice, well put together, comfortable – the whole thing works well, and is easy on the eyes while remaining simple and understated. All industrial design should be so good!

    What’s the Big Deal?

    Here are the features packed into these little guys:

    • Active noise cancellation
    • Bluetooth for wireless audio, software controls, and headset mic function
    • Wired function as a backup, or for older audio devices
    • World-class audio quality
    • Touch controls (!) right in the surface of the right earpiece that control volume, start, stop of media

    Essentially, these have the software and hardware to integrate with any bluetooth device you use for audio. Watch a movie wirelessly from my laptop? Done. Listen to music wirelessly from my phone? Done. I came from using basic wired ear buds (like I said, perhaps this isn’t exactly an objective review) so this feature set blew me away. There might be other headsets with the same software integration.

    When I got these, and figured out how to install the battery, which is a clever puzzle, I took them for an audio spin first. I installed the app on my phone, charged them over USB, and then did a tour of many types of music. I have strange taste in music, probably, so please be gentle:

    • Clapton, Joe Bonamassa and Jeff Beck gave me some sense of the guitar/rock/blues side
    • Some Swedish House Mafia for the “Oonse Oonse Oonse”-ability and bass extension
    • Jackson Browne’s two Solo Acoustic albums for intimate folk/vocals
    • Brandi Carlile Live at Benaroya Hall
    • Macklemore, which is about as close to Hip Hop as I get :-)
    • Yes, if you are into old-school virtuoso rock
    • Various Wynton Marsalis Jazz tunes

    Spotify, Pandora, iTunes and my phone’s music player all worked seamlessly. The app installed and worked without any drama – and for my laptop, no software was required at all. It paired with the headphones and they just worked. Everything sounded amazing.

    The touch controls work well and are intuitive – you swipe a finger up, down, or across the black surface of the right earphone to change volume, skip ahead, skip back or pause.

    The only strange thing was getting used to turning them off when I put them down, so the battery would not run while I didn’t have them on. I have since developed that habit. The on/off button is in a very convenient place and has white (on) and red (charging) backlight LEDs, and a tactile bump so you can hit it without looking. Great details.

    What about that Problem Using the Cable?

    I had read, before we bought these, that some people complained about poor audio quality when using them in wired mode instead of over bluetooth, and that, at $400, was a worry. Here’s the thing I think may have happened with those folks – though I am speculating: when I first plugged in the audio cable, at the headphones end, it was really stiff. I thought I had seated the jack all the way in, but I listened, and sure enough, a thin, tinny sound and practically nothing from one earphone. However, I took them off and gave that jack a good shove, and, click, there was one more notch. After that it sounded perfect.

    Here’s one thing about the wired-or-not setup:

    The headphones have audio enhancement software built in, like a digital equalizer and a “concert” mode that enhances the stereo. These features are controlled from the phone app, but reside in the headphones. However, those features, from what I can tell, work on the bluetooth stream and not over the wire. So, while the wired sound is excellent, you can hear a difference. You have to apply those effects, if you want them, using the source system or software (your music player or stereo). Example: if you use the EQ in the software, I believe it only affects the signal over bluetooth. If you want an EQ over the wire, then the audio source software or device has to have one, and you’d use that.

    To me this is no problem at all.

    Just to be sure, I did this test:

    On Brandi Carlile’s Live at Benaroya Hall, track Turpentine is one I know really well, because I was there at Benaroya Hall when it was recorded. Amazing show. If you listen really carefully you can hear me singing in the audience on the recording (ok, actually not). Anyway, I tried switching back and forth between bluetooth and wired connections during that track to see what the differences were, and if the wired connection was worse in any way, and I’m here to say the wire sounds just fine. If that’s holding you back from a Zik purchase, go for it. Non-issue.

    Made for Planes

    I was itching to try these on an airplane, and this week a work trip granted me my wish. While they are good at home, they are amazing in an airplane. The noise cancellation cut all the plane noise down to a faint hiss, while other people’s voices in the cabin still came through. For my flights I rented two movies to watch through iTunes going and coming back. I tried for something with musical interest, but that I hadn’t seen:

    Going: Pitch Perfect

    Coming back: Searching for Sugar Man

    Pitch Perfect was a little light, but entertaining, and it has a fun sound track. I really liked the lead character Beca (Anna Kendrick), who is super cute.

    Searching for Sugar Man was just amazing – great film.

    The Zik was (were?) incredible on the trip. Comfortable, great sound quality. I charged them before each of the flights, and a four-hour flight used about 40% of their battery.

    Down sides?

    Overall I am really happy with these. They could use a hard case, and that thing with the effort required to plug things in when the unit was new is a quibble. Overall, though, Le Zik is just full to overflowing with L’Awesome.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement