<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tag 'sql'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=sql&amp;orTags=0</link><description>Search results matching tag 'sql'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Filegroups and Non-Clustered Indexes</title><link>http://sqlblog.com/blogs/rob_farley/archive/2013/03/11/filegroups-and-non-clustered-indexes.aspx</link><pubDate>Tue, 12 Mar 2013 00:20:11 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48187</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;Let’s start with some basics and then jump in a bit deeper, for this post to go with the 40th &lt;a href="http://www.midnightdba.com/Jen/2013/03/invitation-to-t-sql-tuesday-040-file-and-filegroup-wisdom/" target="_blank"&gt;T-SQL Tuesday&lt;/a&gt;, hosted this month by &lt;a href="http://www.midnightdba.com/Jen/" target="_blank"&gt;Jen McCown&lt;/a&gt;. &lt;a href="http://www.midnightdba.com/Jen/2013/03/invitation-to-t-sql-tuesday-040-file-and-filegroup-wisdom/" target="_blank"&gt;&lt;img style="background-image:none;border-right-width:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="TSQL2sDay150x150" border="0" alt="TSQL2sDay150x150" align="right" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_3F5FB646.jpg" width="170" height="170" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;SQL Server holds data, and that data is stored physically in files.&lt;/p&gt;  &lt;p&gt;Of course, in the database world we think of the data as living in tables&lt;a href="http://sqlblog.com/blogs/rob_farley/archive/2012/12/11/joins-in-single-table-queries.aspx" target="_blank"&gt;*&lt;/a&gt;, so naturally there must be some sort of mapping between the data in a table and the data in a file. This is where filegroups come in.&lt;/p&gt;  &lt;p&gt;When SQL objects are created within a database, there is an option to be able to tell the system where to actually store it. But this isn’t a file, it’s a filegroup. (If it were a file and the disk volume that the file was on filled up, we’d have big problems. Instead, we put it in a filegroup, and can add extra files (on different volumes if so desired) to that filegroup.) Objects are stored within filegroups. Filegroups are groups of files, although many database systems do end up with only a single file per filegroup.&lt;/p&gt;  &lt;p&gt;Filegroups end up providing a large amount of flexibility for the storage of data. Rarely accessed data can be put in filegroups that have files on cheaper (but probably slower) disk, while data that is heavily written can be put on disk that’s more appropriate for that, and so on. I’m sure you get the picture, and this is nothing overly thought-worthy.&lt;/p&gt;  &lt;p&gt;You may even have already considered the idea around partitioning data across filegroups, moving data of a certain age (but potentially even from the same table) onto a different filegroup, so that queries that use different parts of tables can benefit from having some data on faster disk. Lots of potential. :)&lt;/p&gt;  &lt;p&gt;Where it becomes interesting though, is when you start to consider the concept of non-clustered indexes.&lt;/p&gt;  &lt;p&gt;You see, while a clustered index (or heap) provides the structure to store the data across all the columns of a table (although I’m simplifying it a little in regard to LOB data), we use copies of some of that data (in non-clustered indexes) to optimise access to the data.&lt;/p&gt;  &lt;p&gt;So where are these copies stored? Many database people don’t even think about this kind of thing, in the same way that they often don’t think about including indexes in their database design. I’m going to figure you’re not in that boat though, because I probably know you, and you’re probably not one of those people.&lt;/p&gt;  &lt;p&gt;Most people like to have a default filegroup that is NOT the primary filegroup. It means that when they create new objects, those new objects get put in the default filegroup, not the primary one. But it’s not actually that simple.&lt;/p&gt;  &lt;p&gt;Let’s start by creating a database.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE DATABASE fg_testing;        &lt;br /&gt;GO         &lt;br /&gt;USE fg_testing;         &lt;br /&gt;GO         &lt;br /&gt;SELECT *         &lt;br /&gt;FROM sys.filegroups;         &lt;br /&gt;--Only one filegroup at the moment         &lt;br /&gt;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_39E11FD5.png" width="586" height="265" /&gt;&lt;/p&gt;  &lt;p&gt;Notice the column &lt;font face="Consolas"&gt;data_space_id&lt;/font&gt;. This is the column which identifies each filegroup. We’ll use it later.&lt;/p&gt;  &lt;p&gt;Let’s create a new filegroup and set it to be the default.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;ALTER DATABASE fg_testing ADD FILEGROUP FG2;        &lt;br /&gt;GO         &lt;br /&gt;SELECT *         &lt;br /&gt;FROM sys.filegroups;         &lt;br /&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_2FCC7EAA.png" width="567" height="206" /&gt;&lt;/p&gt;  &lt;p&gt;Cool – data_space_id 2 is created.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;ALTER DATABASE fg_testing MODIFY FILEGROUP FG2 DEFAULT;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_7ADF796C.png" width="564" height="105" /&gt;     &lt;br /&gt;&lt;/p&gt;  &lt;p&gt;I think this is a shame – but I guess it’s fair enough. We have to have a file in the filegroup before we can make it the default. That’s easy enough though, and probably what we’d be wanting to do before too long anyway.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;ALTER DATABASE fg_testing ADD FILE ( NAME = N'file2', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\fg_testing_file2.ndf') TO FILEGROUP FG2;        &lt;br /&gt;GO         &lt;br /&gt;ALTER DATABASE fg_testing MODIFY FILEGROUP FG2 DEFAULT;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_6A8401B3.png" width="770" height="162" /&gt;&lt;/p&gt;  &lt;p&gt;(I’m doing this on my laptop, which only has one physical drive – on a server it’d go somewhere more sensible of course)&lt;/p&gt;  &lt;p&gt;Now let’s add another filegroup. This is going to be for some data, but I don’t want it to be my default.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;ALTER DATABASE fg_testing ADD FILEGROUP FG3;        &lt;br /&gt;GO         &lt;br /&gt;ALTER DATABASE fg_testing ADD FILE ( NAME = N'file3', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\fg_testing_file3.ndf') TO FILEGROUP FG3;&lt;/font&gt;       &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_07A9A9BE.png" width="731" height="186" /&gt;&lt;/p&gt;  &lt;p&gt;Now, I want to create an object on FG3, which I’m going to pretend is my slower disk.&lt;/p&gt;  &lt;p&gt;Once created, I’m looking in sys.objects to confirm which filegroup the table is in, but there’s no information there. Remember that a table is only metadata, and the things that matter are the indexes/heaps that are on it. Checking in sys.indexes shows me that indeed, it’s in data_space_id 3.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE TABLE dbo.OrderDates (OrderDate date PRIMARY KEY, NumOrders int NOT NULL DEFAULT 0) ON FG3;        &lt;br /&gt;GO         &lt;br /&gt;SELECT *         &lt;br /&gt;FROM sys.objects         &lt;br /&gt;WHERE object_id = OBJECT_ID('dbo.OrderDates');         &lt;br /&gt;GO         &lt;br /&gt;SELECT *         &lt;br /&gt;FROM sys.indexes         &lt;br /&gt;WHERE object_id = OBJECT_ID('dbo.OrderDates');         &lt;br /&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_798ABAC0.png" width="1063" height="449" /&gt;&lt;/p&gt;  &lt;p&gt;For completeness’ sake, I’m going to put some data in there, using a query &lt;a href="http://sqlblog.com/blogs/rob_farley/archive/2013/03/11/a-top-query.aspx" target="_blank"&gt;that I blogged about yesterday&lt;/a&gt;.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;INSERT dbo.OrderDates (OrderDate, NumOrders)        &lt;br /&gt;SELECT TOP (DATEDIFF(DAY,'20010101','20130312'))         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; DATEADD(day,ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1,'20010101') AS OrderDate,         &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br /&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2         &lt;br /&gt;ORDER BY ROW_NUMBER() OVER (ORDER BY (SELECT 1))         &lt;br /&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_461ED197.png" width="796" height="235" /&gt;&lt;/p&gt;  &lt;p&gt;But the whole point of this was to see what happens with the non-clustered index, which I want to be on recent data only (ie, filtered), and I want it to be in the DEFAULT filegroup.&lt;/p&gt;  &lt;p&gt;As I want it in the default group, I won’t specify a filegroup for the index.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE INDEX ixRecentData ON dbo.OrderDates (OrderDate)        &lt;br /&gt;INCLUDE (NumOrders)         &lt;br /&gt;WHERE OrderDate &amp;gt;= '20130101';         &lt;br /&gt;GO         &lt;br /&gt;SELECT *         &lt;br /&gt;FROM sys.indexes         &lt;br /&gt;WHERE object_id = OBJECT_ID('dbo.OrderDates');&lt;/font&gt;       &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_698B502F.png" width="968" height="333" /&gt;&lt;/p&gt;  &lt;p&gt;But look! It’s put the non-clustered index in the same filegroup as the clustered index. This isn’t what I wanted. In fact, it’s almost never what I’d want, because even if the disk performance is identical, it can be nice to have transactions which update both the clustered and non-clustered indexes using different disks.&lt;/p&gt;  &lt;p&gt;Instead, we have to specify it explicitly to tell it to use the filegroup we want.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;DROP INDEX ixRecentData ON dbo.OrderDates        &lt;br /&gt;GO         &lt;br /&gt;CREATE INDEX ixRecentData ON dbo.OrderDates (OrderDate)         &lt;br /&gt;INCLUDE (NumOrders)         &lt;br /&gt;WHERE OrderDate &amp;gt;= '20130101'         &lt;br /&gt;ON FG2;         &lt;br /&gt;GO         &lt;br /&gt;SELECT *         &lt;br /&gt;FROM sys.indexes         &lt;br /&gt;WHERE object_id = OBJECT_ID('dbo.OrderDates');&lt;/font&gt;       &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_5E321625.png" width="902" height="386" /&gt;&lt;/p&gt;  &lt;p&gt;It’s a shame to have to be so explicit with these things, but when you create non-clustered indexes, you really should think about which filegroup they’re going onto, because they may well not go onto the default one as you might expect.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>A TOP Query</title><link>http://sqlblog.com/blogs/rob_farley/archive/2013/03/11/a-top-query.aspx</link><pubDate>Mon, 11 Mar 2013 07:02:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48170</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;For the blog post that I’ll publish tomorrow, I wrote a query that I thought needed a blog post all of its own. This is that post.&lt;/p&gt;  &lt;p&gt;The query was this one. Its results aren’t that interesting, it’s just a list of dates with a random number between 0 and 99. Just some sample data that I thought I’d use.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT TOP (DATEDIFF(DAY,'20010101','20130312'))        &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DATEADD(day,ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1,'20010101') AS OrderDate,         &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2        &lt;br&gt;ORDER BY ROW_NUMBER() OVER (ORDER BY (SELECT 1));&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_2D71F19B.png" width="769" height="247"&gt;&lt;/p&gt;  &lt;p&gt;So, let me give some background...&lt;/p&gt;  &lt;p&gt;When teaching about T-SQL, I often point out that a nums table is tremendously useful. One of its uses is to make a dates table, which can be really handy in a bunch of other ways, like in data warehouses. The idea is simple, assuming you have a table of numbers which starts from either 0 or 1 (I don’t really care, although I start mine from 1). Here I figure that you want to get dates from between 20010101, up to (but not including) today.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT DATEADD(day, num-1, '20010101')        &lt;br&gt;FROM dbo.nums         &lt;br&gt;WHERE num &amp;lt;= DATEDIFF(day, '20010101', SYSDATETIME());&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I’m not going to show you the results, I figure that it’s easy for you to picture a list of dates.&lt;/p&gt;  &lt;p&gt;Oh, alright then.&lt;/p&gt;  &lt;p&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_28CBC114.png" width="529" height="251"&gt;&lt;/p&gt;  &lt;p&gt;Anyway, moving on.&lt;/p&gt;  &lt;p&gt;In this case, I didn’t have a nums table handy, and for that, I tend to use ROW_NUMBER() and the table master.dbo.spt_values. This table contains a bunch of rows of handy reference data. I wasn’t interested in the contents though, I just wanted to have enough rows, and it’s quick to reference that rather than writing a bunch of self-joining CTEs. There’s over 2000 rows (2515 to be exact) in master.dbo.spt_values, and if I want to have up to 4 million, I just do a cross join to itself.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS num        &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2;         &lt;br&gt;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_65A042E6.png" width="597" height="240"&gt;&lt;/p&gt;  &lt;p&gt;This query gives me a lot of rows, of course... but if I use it in a sub-query (or CTE) and filter it, then the simplification work that the Query Optimizer does will mean that it doesn’t try to work out all 4 million rows for me, it’ll stop when it’s seen enough.&lt;/p&gt;  &lt;p&gt;As an example, I can use this in conjunction with my earlier query, which had a WHERE clause.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;WITH nums AS (        &lt;br&gt;SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS num         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2         &lt;br&gt;)         &lt;br&gt;SELECT DATEADD(day, num-1, '20010101')         &lt;br&gt;FROM nums         &lt;br&gt;WHERE num &amp;lt;= DATEDIFF(day, '20010101', SYSDATETIME());&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This produces my rows in less than a second, giving the same results as when we had a physical nums table.&lt;/p&gt;  &lt;p&gt;To include my random values, I’m using the commonly found randomizing method of ABS(CHECKSUM(NEWID())) % 100. RAND() is no good, it just produces the same value for every row. NEWID() is much better, but it’s not a number. CHECKSUM solves that, but can be negative. ABS will wrap that up nicely and give a random number in a large range. Mod 100 solves that.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;WITH nums AS (        &lt;br&gt;SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS num         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2         &lt;br&gt;)         &lt;br&gt;SELECT DATEADD(day, num-1, '20010101') AS OrderDate,         &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br&gt;FROM nums         &lt;br&gt;WHERE num &amp;lt;= DATEDIFF(day, '20010101', SYSDATETIME());&lt;/font&gt;       &lt;br&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_2D9E4F03.png" width="556" height="281"&gt;&lt;/p&gt;  &lt;p&gt;This works nicely, and is a common pattern for dealing with this kind of thing.&lt;/p&gt;  &lt;p&gt;But it’s not the query at the top of this post. That was done without a CTE, and used TOP instead, ordering by the ROW_NUMBER().&lt;/p&gt;  &lt;p&gt;Let’s have a look at it.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT TOP (DATEDIFF(DAY,'20010101','20130312'))        &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DATEADD(day,ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1,'20010101') AS OrderDate,         &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2        &lt;br&gt;ORDER BY ROW_NUMBER() OVER (ORDER BY (SELECT 1));&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;We can see that the FROM clause is the same as in our nums CTE. And the same randomising bit for NumOrders is in there. &lt;/p&gt;  &lt;p&gt;But instead of using ROW_NUMBER() OVER (ORDER BY (SELECT 1)) to define a num column, we’re using it within another function? Can we do that? Yes. ROW_NUMBER() can only be used in the SELECT clause and in the ORDER BY clause, and there are restrictions about using it within aggregate functions as you might expect, but here we’re just using it as a value which changes with every row, and there’s really no problem at all.&lt;/p&gt;  &lt;p&gt;We don’t have a filter though. In our CTE version, we used a filter to make sure we weren’t using every row from our CROSS JOIN. Here, we’re using TOP instead. But not TOP with some number – TOP with a function in it! This has been possible for a while, and it even supports sub-queries that produce numbers, in the form (including the double-bracket): &lt;font face="Consolas"&gt;SELECT TOP ((SELECT SomeVal ... )) Col1, Col2...&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;TOP appears in this query because I needed to limit the number of rows coming back. I couldn’t use the WHERE clause, because I didn’t have anything to filter on. So I used TOP, and had to use a function in there.&lt;/p&gt;  &lt;p&gt;So let’s compare the plans.&lt;/p&gt;  &lt;p&gt;To be fair, I’ll use the fixed date in both queries (for now).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_27B3859D.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_09A551DC.png" width="832" height="301"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Oh, how disappointing! My TOP query is 79% of the batch, and the CTE version is 21%. Clearly my new one is 4 times as bad, and I should give up on it.&lt;/p&gt;  &lt;p&gt;Actually, if you run these two queries on your own machine, you’ll see the first one isn’t 4 times as bad at all. It’s actually FASTER than the second. Something’s going on, and we should find out what.&lt;/p&gt;  &lt;p&gt;The plans look remarkably similar. In fact, the second one is identical, but has an extra Filter operator. I don’t have a Filter in the TOP one, but I didn’t really expect it to make that much difference.&lt;/p&gt;  &lt;p&gt;Otherwise, the plans look pretty similar. They both use a Row Count Spool, have a Sequence Project to work out the ROW_NUMBER, and they both use a Top operator – even the second one which doesn’t use TOP.&lt;/p&gt;  &lt;p&gt;But you see, the Query Optimizer would have seen that I was filtering on a column that mapped to ROW_NUMBER(), and that I was doing a “less than” operation there. That’s like doing a TOP, and the Query Optimizer sees benefit in this. It doesn’t explain what’s going on though with the “4 times worse” thing though.&lt;/p&gt;  &lt;p&gt;Let’s examine some of the numbers. Interestingly, we see that the Nested Loop operator expects to do almost no work in the second plan, and 27% in the first. There’s a warning on there – that’s just because I’m using a CROSS JOIN, and I’m okay with that.&lt;/p&gt;  &lt;p&gt;The word ‘expect’ was very important in that last paragraph. The percentages there are based on the work that is &lt;em&gt;expected &lt;/em&gt;to be done. Let’s look at the properties of the two Nested Loop operators.&lt;/p&gt;  &lt;p&gt;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_33C4D9F7.png" width="457" height="588"&gt;&amp;nbsp;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_11209B7C.png" width="429" height="439"&gt;&lt;/p&gt;  &lt;p&gt;Glancing down here, we see a lot of it is the same, but the &lt;em&gt;Estimated Number of Rows&lt;/em&gt; in the first one is 4453 (which is correct), while the second one is only 100 (not quite so correct). The arrows on the upper side of the Nested Loops show the effect of this.&lt;/p&gt;  &lt;p&gt;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_2EB2767B.png" width="440" height="223"&gt;&amp;nbsp;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_7EA824F9.png" width="375" height="178"&gt;&lt;/p&gt;  &lt;p&gt;The second one figures it’ll have seen all the rows it needs to see before it gets a second row from the first table, whereas the second one things it might need 1.77058 rows (being 4453/2515). No wonder the second query thinks it’s quicker.&lt;/p&gt;  &lt;p&gt;Let’s see where this 100 estimate comes from though. Perhaps it’s that filter?&lt;/p&gt;  &lt;p&gt;Based on the estimates of the rows going into the filter and coming out of it, it expects that the filter will see 100 rows and return just 30. This is bad too, but it’s not as bad as the 100 v 4453 impact.&lt;/p&gt;  &lt;p&gt;This thing actually comes from the way that the Query Optimizer has turned the WHERE clause into a TOP clause. It figures there’s some doubt there, and guesses that 100 rows is probably not a bad place to start. When we give an explicit value (even using the DATEDIFF function), it can figure out what’s there and use this value. The second query goes part way there and works out that the result of the DATEDIFF is 4453, but simply doesn’t apply it fully to the Top operator, leaving us with that guess.&lt;/p&gt;  &lt;p&gt;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_0A5D6C2C.png" width="405" height="553"&gt;&amp;nbsp;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_72E2B7FA.png" width="378" height="511"&gt;&lt;/p&gt;  &lt;p&gt;It’d be nice if it could tell that 4453 is never NULL, and simplify this out a bit more, but it simply doesn’t do this.&lt;/p&gt;  &lt;p&gt;To round the post off, let’s consider what happens if we’re using SYSDATETIME() instead of the constant.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT TOP (DATEDIFF(DAY,'20010101',SYSDATETIME()))        &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DATEADD(day,ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1,'20010101') AS OrderDate,         &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2        &lt;br&gt;ORDER BY ROW_NUMBER() OVER (ORDER BY (SELECT 1));&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;&lt;font face="Consolas"&gt;WITH nums AS        &lt;br&gt;(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS num         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2         &lt;br&gt;)         &lt;br&gt;SELECT DATEADD(day,num-1,'20010101') AS OrderDate, ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br&gt;FROM nums         &lt;br&gt;WHERE num &amp;lt;= DATEDIFF(DAY,'20010101',SYSDATETIME());&lt;/font&gt;       &lt;br&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/image_381E0D1E.png"&gt;&lt;img style="border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_thumb_135CCFDA.png" width="834" height="316"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Oh! Now the first one is simpler still, leaving out the Row Count Spool operator, and thinking it’s going to be cheaper than the second one. Having not trusted that figure before, does this mean the first one is actually worse? Well, we have an idea about where to look – the estimates on some of the arrows, particularly near the Top operator.&lt;/p&gt;  &lt;p&gt;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_54C45E6C.png" width="376" height="216"&gt;&amp;nbsp;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_20AFBF19.png" width="407" height="220"&gt;&lt;/p&gt;  &lt;p&gt;Oh no! Our first query thinks there’s now only one row going to be coming through. How awful! (Oh, and the number 4452 is fine, because I’m actually running this on March 11th, not March 12th, it’s just that March 12th is T-SQL Tuesday this month, which is what I was writing the query for).&lt;/p&gt;  &lt;p&gt;If you run this on your machine, hopefully you saw something different. Really.&lt;/p&gt;  &lt;p&gt;You see, this problem has kinda been fixed, and if you enable the &lt;em&gt;documented&lt;/em&gt; traceflag 4199, it should be better – for the first query at least.&lt;/p&gt;  &lt;p&gt;By turning on trace flag 4199, and telling it to ignore what’s in the cache, it will evaluate SYSDATETIME() for that particular execution, and therefore come up with the right value for the Top operator. It doesn’t fix the WHERE clause version, but it does solve the TOP clause version.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT TOP (DATEDIFF(DAY,'20010101',SYSDATETIME()))        &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DATEADD(day,ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1,'20010101') AS OrderDate,         &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2         &lt;br&gt;ORDER BY ROW_NUMBER() OVER (ORDER BY (SELECT 1))        &lt;br&gt;OPTION (QUERYTRACEON 4199, RECOMPILE);&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;&lt;font face="Consolas"&gt;WITH nums AS        &lt;br&gt;(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS num         &lt;br&gt;FROM master.dbo.spt_values t1 CROSS JOIN master.dbo.spt_values t2         &lt;br&gt;)         &lt;br&gt;SELECT DATEADD(day,num-1,'20010101') AS OrderDate, ABS(CHECKSUM(NEWID())) % 100 as NumOrders         &lt;br&gt;FROM nums         &lt;br&gt;WHERE num &amp;lt;= DATEDIFF(DAY,'20010101',SYSDATETIME())         &lt;br&gt;OPTION (QUERYTRACEON 4199, RECOMPILE);&lt;/font&gt;       &lt;br&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="border-width:0px;margin:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_42B81ADF.png" width="338" height="211"&gt;&lt;/p&gt;  &lt;p&gt;The reason why I say this shouldn’t’ve happened on your system is because you’re probably using trace flag 4199 as a start-up parameter.&lt;/p&gt;  &lt;p&gt;So there you have it... a query which might seem strange at first glance, but is actually a really nice alternative. Don’t be afraid to use expressions in your TOP clause – it’s a very powerful mechanism, and TOP is a great operator to see in your plans (because they will ask for fewer rows from your Scans and Seeks – remember how execution plans suck?). As is often the case, we’re tripped up by estimates being out, but if you can see what’s actually going on, you should be able to make good decisions about how to form your queries.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Behind the scenes of PowerShell and SQL</title><link>http://sqlblog.com/blogs/rob_farley/archive/2013/02/11/behind-the-scenes-of-powershell-and-sql.aspx</link><pubDate>Tue, 12 Feb 2013 00:11:46 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47649</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;Every year, PowerShell increases its stranglehold on the Windows Server system and the applications that run upon it – with good reason too. Its consistent mechanisms for interaction between its scripting interface and the underlying systems make it easy for people to feel comfortable, and there is a discoverability that has been lacking in many other scripting environments.&lt;/p&gt;  &lt;p&gt;Of course, SQL Server hasn’t been overlooked at all, and it’s coming up to five years since the &lt;a href="https://msmvps.com/blogs/robfarley/archive/2008/03/04/sql-server-2008-powershell-snapin.aspx" target="_blank"&gt;SnapIns were made available&lt;/a&gt; (even longer since people started to &lt;a href="http://msmvps.com/blogs/robfarley/archive/2007/01/05/powershell-and-sql.aspx" target="_blank"&gt;dabble with SQL using PowerShell&lt;/a&gt;).&lt;/p&gt;  &lt;p&gt;But what’s going on behind the scenes? Does PowerShell present a threat to those amongst us who will always prefer T-SQL? Does PowerShell give us new options that are not available any other way? Well, let’s have a bit of a look, especially since this month’s T-SQL Tuesday (hosted by &lt;a href="http://blog.waynesheffield.com/wayne/archive/2013/02/invitation-for-t-sql-tuesday-39-can-you-shell-what-the-posh-is-cooking/" target="_blank"&gt;Wayne Sheffield&lt;/a&gt; who tweets as &lt;a href="http://twitter.com/DBAWayne" target="_blank"&gt;@DBAWayne&lt;/a&gt;) is on the topic of PowerShell.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blog.waynesheffield.com/wayne/archive/2013/02/invitation-for-t-sql-tuesday-39-can-you-shell-what-the-posh-is-cooking/" target="_blank"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top:0px;border-right:0px;padding-top:0px;" title="TSQL2sDay150x150" border="0" alt="TSQL2sDay150x150" align="right" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_41BF631A.jpg" width="170" height="170" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;So we know PowerShell is useful. However we spin it up, we can quickly jump into writing commands, whether it be interacting with WMI, hooking into some .Net assembly we’ve loaded up, or simply browsing the file system. I’ve developed a tendency to use it to start whichever SQL instances I’m needing for the day – by default I have all of them turned off, since I don’t know which one I’ll be wanting most.&lt;/p&gt;  &lt;p&gt;If we’re going to be interacting with SQL, then it’s easiest to either load up the SQLPS environment directly (there’s a PowerShell shortcut within Management Studio), or else (as I do), start a PowerShell window with the Snapin loaded. I prefer this later option, as the SQLPS environment is a slightly cut-back version of PowerShell. But either way – the stuff I’ll continue on with is essentially the same whichever environment you use.&lt;/p&gt;  &lt;p&gt;If you’ve talked about SQL with me long enough, you’ll have come across the fact that I often use SQL Profiler when I’m curious about where to find information. My former colleague &lt;a href="http://www.jimmcleod.net" target="_blank"&gt;Jim McLeod&lt;/a&gt; (&lt;a href="http://twitter.com/jim_mcleod" target="_blank"&gt;@Jim_McLeod&lt;/a&gt;) blogged &lt;a href="http://www.jimmcleod.net/blog/index.php/2012/08/14/t-sql-tuesday-33-trick-shots/" target="_blank"&gt;about this a few months ago&lt;/a&gt;, with an example that I remember looking through with him four or five years ago. It’s a great technique that works on all kinds of things, even across different versions of SQL Server. It also adds as a terrific reminder that Management Studio is not a special application, it simply knows how to ask for the pieces of information that it shows.&lt;/p&gt;  &lt;p&gt;But PowerShell (or SMO, for that matter), that’s in the .Net world. Surely that would be able to bypass the clunky T-SQL stuff that Management Studio does... I mean, Management Studio has to be flexible enough to work across remote servers, talking through firewalls that only allow T-SQL interfaces. Surely PowerShell has access to a deeper magic.&lt;/p&gt;  &lt;p&gt;Well, no. PowerShell still lets you talk to remote servers, and ends up using the same methods.&lt;/p&gt;  &lt;p&gt;Let’s prove it.&lt;/p&gt;  &lt;p&gt;Spin up Profiler, and start a trace against your favourite instance. I like to watch for SQL:BatchCompleted, SP:StmtCompleted and RPC:Completed events when doing this kind of thing. I’m using an instance that isn’t doing anything else, but you could apply a ColumnFilter to filter the events to things with an ApplicationName starting with SQLPS if you prefer.&lt;/p&gt;  &lt;p&gt;With that running, I jump into PowerShell and do something like:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;PS SQLSERVER:\sql\localhost\sql2008r2&amp;gt; dir Databases | ft name&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This lists the names of the databases on my SQL2008R2 instances. You don’t need to see the results, you can imagine them for yourself.&lt;/p&gt;  &lt;p&gt;If PowerShell were using some secret interface, it’s unlikely we’d see something in Profiler. But it’s not, and we see a bunch of stuff.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_56ED68C2.png" width="672" height="468" /&gt;&lt;/p&gt;  &lt;p&gt;We see a bunch of Remote Procedure Calls, each with a Stored Procedure Statement Completed event showing the same information. And look – we see queries against master.sys.databases, asking for the name of each of the databases, passing in the name as a parameter. Brilliant! Notice just a bit earlier though, there’s a SQL:BatchCompleted call. This means that a query has been passed in directly. It’s this:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT       &lt;br /&gt;CAST(        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; serverproperty(N'Servername')        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; AS sysname) AS [Server_Name],        &lt;br /&gt;dtb.name AS [Name]        &lt;br /&gt;FROM        &lt;br /&gt;master.sys.databases AS dtb        &lt;br /&gt;WHERE        &lt;br /&gt;(CAST(case when dtb.name in ('master','model','msdb','tempdb') then 1 else dtb.is_distributor end AS bit)=0)        &lt;br /&gt;ORDER BY        &lt;br /&gt;[Name] ASC&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_3EB9D89A.png" width="646" height="432" /&gt;&lt;/p&gt;    &lt;p&gt;So it grabs the list of database names first, and then makes extra calls to be able to fetch the list of names again, one by one.&lt;/p&gt;  &lt;p&gt;The reason why it’s grabbing the list of names one by one isn’t because it’s stupid and is asking to be ridiculed. It’s because we've asked to see that property, and I guess the PowerShell people figured that no matter what property you ask for, it’ll go and fetch it to show you.&lt;/p&gt;  &lt;p&gt;When I asked for the CompatibilityLevel property instead, I got some different rows thrown in. Interestingly though, it still asked for the name each time.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_3862DC3F.png" width="635" height="437" /&gt;&lt;/p&gt;  &lt;p&gt;Also interestingly, when I asked for the CompatibilityLevel a subsequent time, the calls for “SELECT dtb.compatibility_level…” weren’t in there. They’d been cached by the PowerShell environment – important to note if you ever come across PowerShell giving you old values.&lt;/p&gt;  &lt;p&gt;So what about asking something more interesting? Let’s try asking about the IndexSpaceUsage in AdventureWorks.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;PS SQLSERVER:\sql\localhost\sql2008r2&amp;gt; gi Databases\AdventureWorks | ft IndexSpaceUsage&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The result tells me it’s 62576. Yeah, but today I’m not interested in that, just what happened in the trace.&lt;/p&gt;  &lt;p&gt;Four entries. An SP:StmtCompleted with matching RPC:Completed, and two SQL:BatchCompleted.&lt;/p&gt;  &lt;p&gt;The SP:StmtCompleted and RPC:Completed were this statement, passing in the parameter value ‘AdventureWorks’. Clearly incredibly informative.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT       &lt;br /&gt;CAST(0 AS float) AS [IndexSpaceUsage],        &lt;br /&gt;dtb.name AS [DatabaseName]        &lt;br /&gt;FROM        &lt;br /&gt;master.sys.databases AS dtb        &lt;br /&gt;WHERE        &lt;br /&gt;(dtb.name=@_msparam_0)&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This is getting the value zero. Wow. Brilliant stuff.&lt;/p&gt;  &lt;p&gt;The last entry – the second of the two SQL:BatchCompleted events is:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;select convert(float,low/1024.) from master.dbo.spt_values where number = 1 and type = 'E'&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;If you run this in Management Studio, you’ll discover it gives the value 8. Ok.&lt;/p&gt;  &lt;p&gt;The other entry is more interesting.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;use [AdventureWorks]       &lt;br /&gt;SELECT        &lt;br /&gt;SUM(CASE WHEN a.type &amp;lt;&amp;gt; 1 THEN a.used_pages WHEN p.index_id &amp;lt; 2 THEN a.data_pages ELSE 0 END) AS [DataSpaceUsage],        &lt;br /&gt;SUM(a.used_pages) AS [IndexSpaceTotal]        &lt;br /&gt;FROM        &lt;br /&gt;sys.allocation_units AS a INNER JOIN sys.partitions AS p ON (a.type = 2 AND p.partition_id = a.container_id) OR (a.type IN (1,3) AND p.hobt_id = a.container_id)&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This is more like it! We run this in Management Studio, and we see two values. DataSpaceUsage is 13682, IndexSpaceTotal is 21504. Neither are our value 62576. But we do have clues in the column names, and in that value 8 that came back too. We can easily deduce that it’s actually (IndexSpaceTotal-DataSpaceUsage)*8, and we have ourselves a nice little method for working out the IndexSpaceUsage ourselves now if we need it.&lt;/p&gt;  &lt;p&gt;Or we can just ask PowerShell next time as well.&lt;/p&gt;  &lt;p&gt;Incidentally – if you’re considering doing the MCM Lab exam some time, then you might find that a familiarity with PowerShell comes in really handy. I’m not saying there are PowerShell questions on the exam at all – I’m just suggesting that you may find that PowerShell becomes a really useful way of getting at some of the information that you’re looking for. If you’re stumbling around the list of DMVs trying to remember which one it is that stores some particular thing, remember that you might be able to get the data out more easily if you use PowerShell instead.&lt;/p&gt;  &lt;p&gt;So can we discover secret things about SQL from PowerShell? Are there things we can do in PowerShell that are impossible through other mechanisms? Hooks that let us break the rules even?&lt;/p&gt;  &lt;p&gt;Recently, Kendal van Dyke asked a question about this kind of thing on Twitter. He was wondering if you could have a default constraint on a column in a view. The reason for his wondering was that he saw a property on a view column in PowerShell that made him wonder. The answer is no though, and there’s a simple reason.&lt;/p&gt;  &lt;p&gt;PowerShell is a programmatic interface. It involves classes and property and methods. It does things row by row, which is why much of what you see in that trace feels amazingly pedantic – asking about things which shouldn’t have to be that complicated. The implication of this though, is that PowerShell reuses the concept of a column, regardless of whether this is a column in a table, a view, or anywhere else it decides to need a column. The fact that columns in tables have some extra properties isn’t enough to make this class re-use pointless. If we try to set a Default constraint for a column in a view though, we get an error, just like if we tried to do it any other way.&lt;/p&gt;  &lt;p&gt;The PowerShell I used was:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;$db = Get-Item SQLSERVER:\sql\localhost\sql2008r2\Databases\AdventureWorks       &lt;br /&gt;&lt;/font&gt;&lt;font face="Consolas"&gt;$def = New-Object -TypeName Microsoft.SqlServer.Management.SMO.Default –ArgumentList $db, &amp;quot;ViewDefault&amp;quot;       &lt;br /&gt;&lt;/font&gt;&lt;font face="Consolas"&gt;$def.TextHeader = &amp;quot;CREATE DEFAULT ViewDefault AS&amp;quot;       &lt;br /&gt;$def.TextBody = &amp;quot;'ABC'&amp;quot;        &lt;br /&gt;&lt;/font&gt;&lt;font face="Consolas"&gt;$def.Create()       &lt;br /&gt;&lt;/font&gt;&lt;font face="Consolas"&gt;$def.BindToColumn(&amp;quot;vStateProvinceCountryRegion&amp;quot;,&amp;quot;StateProvinceCode&amp;quot;,&amp;quot;Person&amp;quot;)&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The code that ended up getting called was to the stored procedure &lt;em&gt;sp_bindefault&lt;/em&gt; (despite it being deprecated). Naturally, trying to execute this against a view column gives an error regardless of what wrappers you have put around it – PowerShell or not.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_125D061C.png" width="813" height="132" /&gt;&lt;/p&gt;  &lt;p&gt;So PowerShell is very useful, and it provides a handy way of getting to a lot of things that could be otherwise hard. But looking below the surface, it isn’t able to circumvent the basic restrictions of SQL Server, because it still ends up doing its work using T-SQL.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>MCM – I passed!</title><link>http://sqlblog.com/blogs/rob_farley/archive/2013/01/04/mcm-i-passed.aspx</link><pubDate>Fri, 04 Jan 2013 23:40:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47017</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;I was wrong – I passed the &lt;a target="_blank" href="http://www.microsoft.com/learning/en/us/exam.aspx?ID=88-971"&gt;MCM lab&lt;/a&gt; last month.&lt;/p&gt;  &lt;p&gt;I know I thought I’d failed – I’m still surprised that I passed. Maybe I managed to satisfy enough of enough questions. Certainly none of the questions were beyond me, &lt;a target="_blank" href="http://sqlblog.com/blogs/rob_farley/archive/2012/12/23/the-mcm-lab-exam-two-days-later.aspx"&gt;as I wrote just after I’d sat it&lt;/a&gt;. But I do know that I left more questions in an incomplete state than I would’ve liked.&lt;/p&gt;  &lt;p&gt;The things I hadn’t finished were mostly those things which I’d simply been caught by from a time-perspective. I didn’t feel like I was out of my depth at all, just that some things hadn’t worked the way that I’d expected. In real life I would’ve been able to solve them, and given a little more time, I would’ve been able to get through them too.&lt;/p&gt;  &lt;p&gt;It’s why I wrote that I was confident I’d pass it on a subsequent attempt, although I’m quite happy to not have to do that. I’m still happy I told everyone I was taking the exams, and would do the same again. The study aspect is still something I’m not sure about. As much as I’d like to advise people to NOT study, I get that most people don’t manage to pass first time (even with some study). Heck – it used to require three weeks of intensive training in Seattle, and I was never going to do that (at least, not unless I was on the other side of the desk).&lt;/p&gt;  &lt;p&gt;So now I’m a Microsoft Certified Master in SQL Server, to go with the Microsoft Certified Trainer cert and Microsoft Most Valuable Professional award that I’ve had since 2006. I don’t know how many people around the world have both MCM and MCT in SQL Server, but there can’t be many. I need to deliver more training.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="MCM" border="0" alt="MCM" width="184" height="140" src="http://sqlblog.com/blogs/rob_farley/MCM_783D07F1.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="MVP_Horizontal_FullColor" border="0" alt="MVP_Horizontal_FullColor" width="317" height="140" src="http://sqlblog.com/blogs/rob_farley/MVP_Horizontal_FullColor_475A5086.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="mct_big" border="0" alt="mct_big" width="260" height="140" src="http://sqlblog.com/blogs/rob_farley/mct_big_275ECDBC.jpg"&gt;&lt;/p&gt;  &lt;p&gt;&lt;a target="_blank" href="http://twitter.com/rob_farley"&gt;@rob_farley&lt;/a&gt;    &lt;br&gt;&lt;a target="_blank" href="http://lobsterpot.com.au/"&gt;LobsterPot Solutions&lt;/a&gt;&lt;/p&gt;</description></item><item><title>The SQL Query Optimizer – when Logical Order can get it wrong</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/12/30/the-sql-query-optimizer-when-logical-order-can-get-it-wrong.aspx</link><pubDate>Sun, 30 Dec 2012 05:32:25 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46882</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;It’s very easy to get in the habit of imagining the way that a query should work based on the Logical Order of query processing – the idea that the FROM clause gets evaluated first, followed by the WHERE clause, GROUP BY, and so on – finally ending with whatever is in the SELECT clause. We even get in the habit of creating indexes that focus on the WHERE clause, and this is mostly right. &lt;/p&gt;  &lt;p&gt;But it’s only &lt;strong&gt;mostly&lt;/strong&gt; right, and it will often depend on statistics.&lt;/p&gt;  &lt;p&gt;There are other situations where statistics have to play a major part in choosing the right plan, of course. In fact, almost every query you ever run will use statistics to work out the best plan. What I’m going to show you in this post is an example of how the statistics end up being incredibly vital in choosing the right plan. It also helps demonstrate an important feature of the way that Scans work, and how to read execution plans.&lt;/p&gt;  &lt;p&gt;I’m going to use AdventureWorks2012 for this example. I’m going to ask for the cheapest product according to the first letter of the product name. This kind of query:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT MIN(ListPrice)       &lt;br /&gt;FROM Production.Product        &lt;br /&gt;WHERE Name LIKE 'H%';&lt;/font&gt;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Don’t run it yet. I want to ask you how you’d solve it on paper.&lt;/p&gt;  &lt;p&gt;Would you prefer I give you a list of the products sorted by name, or would you prefer I give you that list sorted by price?&lt;/p&gt;  &lt;p&gt;If you want the ‘sorted by name’ option, then you’ll have to look through all the products that start with H, and work out which is the cheapest (notice that my predicate is not an equality predicate – if I knew what the name had to be exactly, then I could have an index which ordered by name and then price, and very quickly find the cheapest with that name). This approach could be good if you don’t have many products starting with that particular letter. But if you have lots, then finding them all and then looking for the cheapest of them could feel like too much work. Funnily enough, this is the way that most people would imagine this query being run – applying the WHERE clause first, and applying the aggregate function after that.&lt;/p&gt;  &lt;p&gt;On the other hand, if you have lots of products with that particular letter, you might be better off with your list sorted by price, looking through for the first product that starts with the right letter. &lt;/p&gt;  &lt;p&gt;Let me explain this algorithm a little more.&lt;/p&gt;  &lt;p&gt;If you’re at a restaurant and are strapped for cash, you might want to see what the cheapest thing is. You’d pick the “sorted by price” menu, and go to the first item. But then if you saw it had peanut in, and you have an allergy, then you’d skip it and go to the next one. You wouldn’t expect to have to look far to find one that doesn’t have peanut, and because you’ve got the “sorted by price” menu, you have the cheapest one that satisfies your condition after looking through just a few records.&lt;/p&gt;  &lt;p&gt;It’s clearly not the same algorithm as finding all the things that satisfy the condition first, but it’s just as valid. If you’re only going to have to look through a handful of products before you find one that starts with the right letter, then great! But what if there are none? You’d end up having to look through the whole list before you realised.&lt;/p&gt;  &lt;p&gt;The Query Optimizer faces the same dilemma, but luckily it might have statistics, so it should be able to know which will suit better.&lt;/p&gt;  &lt;p&gt;Let’s create the two indexes – one sorted by Name, one sorted by Price. Both will include the other column, so that the query will only need one of them.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;CREATE INDEX ixNamePrice ON Production.Product (Name) INCLUDE (ListPrice);       &lt;br /&gt;CREATE INDEX ixPriceName ON Production.Product (ListPrice) INCLUDE (Name);&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Now let’s consider two queries. Both queries give the same result – $0.00. But that’s not important, I’m only interested in how they run.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT MIN(ListPrice)       &lt;br /&gt;FROM Production.Product        &lt;br /&gt;WHERE Name LIKE 'I%';&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;&lt;font face="Consolas"&gt;SELECT MIN(ListPrice)       &lt;br /&gt;FROM Production.Product        &lt;br /&gt;WHERE Name LIKE 'H%';&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The two queries are almost identical, but they run quite differently.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_78098898.png" width="575" height="273" /&gt;&lt;/p&gt;  &lt;p&gt;Ok, they’re fairly similar – they both use a Stream Aggregate operator, for example. And they have similar cost. But significantly, one is performing a Seek, while the other is doing a Scan. Different indexes, but nevertheless a Scan and a Seek.&lt;/p&gt;  &lt;p&gt;People will tell you that Scans are bad and Seeks are good, but it’s not necessarily the case. Here, we see that the Scan plan is no more expensive than the Seek plan – it’s just different. We should consider why.&lt;/p&gt;  &lt;p&gt;Those two indexes are the two different stories that I described earlier. There are very few products that start with the letter ‘I’, and quite a number than start with ‘H’, and so the Query Optimizer has chosen differently.&lt;/p&gt;    &lt;p&gt;There are exactly 10 products that start with I. From a total of 504. That’s less than 2% of the products.&lt;/p&gt;  &lt;p&gt;There are 91 products that start with H. That’s 18%. You might not have expected it to be that high, but that’s okay – if SQL has been maintaining statistics for you on this, it hopefully won’t be as surprised as you.&lt;/p&gt;  &lt;p&gt;18% – nearly 1 in 5. So by the time you’ve looked at, oh, a dozen records, you will have almost certainly found one that starts with an H. (Actually, the chance of NOT finding one in the first 12 would be power(.82, 12), which is 0.09. That’s just 9%.) If I do a bit of digging into the internals, I can discover that the pages in my index typically have over a hundred records on them each. The chance of not finding a product that starts with an H on that first page – you’d need lottery-scale luck (1 in 444 million).&lt;/p&gt;  &lt;p&gt;On the other hand, the cost of finding the cheapest value from 91 records is a lot more expensive than finding the cheapest from just 10. And getting all 10 records should be a small number of reads too.&lt;/p&gt;  &lt;p&gt;But a Scan! Really? It has to look through the whole table, right?&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;No. That’s not how it works.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;You see, execution plans go from left to right. If you start reading these plans from the right, you’ll start thinking that the whole index has been scanned, when it’s simply not the case. That Top operator asks for a single row from the index, and that’s all it provides. Once that row has been found, the Scan operation stops.&lt;/p&gt;  &lt;p&gt;For this information, I don’t even need to pull up the Properties window for the Scan (but I would recommend you get in the habit of doing that). No – this is all available in the Tool Tip. Look at the number of “Actual number of rows” – it’s just one.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_743BBDFB.png" width="345" height="559" /&gt;&lt;/p&gt;  &lt;p&gt;A predicate is applied – it looks through the index for rows that start with H – but it’s doing this in Order (see Ordered = True), and it’s stopping after the first row is found. Remember I mentioned that there are actually 91 rows that satisfy the predicate? The Scan doesn’t care – it only needs one and it stops right then.&lt;/p&gt;  &lt;p&gt;You might figure this is because we are using MIN. What if we needed the MAX though? Well, that’s just the same, except that the Direction of the Scan is BACKWARD (you’ll need F4 for that one).&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_63E04642.png" width="291" height="96" /&gt;&lt;/p&gt;  &lt;p&gt;MIN goes forward, because it’s most interested in the ‘smallest’ ones, MAX will go backward because it wants the ‘largest’. (And as you’d probably expect, if you’d created your index to be descending, then it would be reversed.)&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;But again – being able to tell which is the better algorithm depends entirely on your statistics being known&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;I see so many systems have bad statistics for one reason or another, and typically because the data most frequently queried is the newest data, and that makes up such a small percentage of the table. The statistics will think that there is almost no data for ‘today’, as they probably haven’t been updated since at least some number of hours ago.&lt;/p&gt;  &lt;p&gt;When you look at how a query is running, always have a think about you’d solve it on paper, and remember that you might actually have a better (or worse) picture of the statistics than what the Query Optimizer has.&lt;/p&gt;  &lt;p&gt;And remember that a Scan is not necessarily bad. I might do another post on that soon as well.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>SQL Replication demystified</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/12/25/sql-replication-demystified.aspx</link><pubDate>Wed, 26 Dec 2012 01:34:13 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46799</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;I spent some time explaining SQL Server Replication to someone recently. They said they hadn’t ever really understood the concepts, and that I’d managed to help. It’s inspired me to write a post that I wouldn’t normally do – a “101” post. I’m not trying to do a fully comprehensive piece on replication, just enough to be able to help you get the concepts.&lt;/p&gt;  &lt;p&gt;The way I like to think about replication is by comparing it to magazines. The analogy only goes so far, but let’s see how we go.&lt;/p&gt;  &lt;p&gt;The things being replicated are &lt;strong&gt;articles&lt;/strong&gt;. A &lt;strong&gt;publication&lt;/strong&gt; (the responsibility of a publisher) is a collection of these articles. At the other end of the process are people with &lt;strong&gt;subscriptions&lt;/strong&gt;. It’s just like when my son got a magazine subscription last Christmas. Every month, the latest set of articles got delivered to our house. &lt;a href="http://www.newsstand.co.uk/225-Football-Magazines/855-Subscribe-to-FOUR-FOUR-TWO-Magazine-Subscription.aspx" target="_blank"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top:0px;border-right:0px;padding-top:0px;" title="FOUR-FOUR-TWO_JAN-13" border="0" alt="FOUR-FOUR-TWO_JAN-13" align="right" src="http://sqlblog.com/blogs/rob_farley/FOUR-FOUR-TWO_JAN-13_094851EE.jpg" width="192" height="260" /&gt;&lt;/a&gt;&lt;em&gt;(The image here isn’t my own – but feel free to click on it and subscribe to FourFourTwo – excellent magazine, particularly when they’re doing an article about the Arsenal.) &lt;/em&gt;Most of the work is done by &lt;strong&gt;agents&lt;/strong&gt;, such as the newsagent that gets it to my house.&lt;/p&gt;  &lt;p&gt;In SQL Server, these same concepts hold. The objects which are being replicated are articles (typically tables, but also stored procedures, functions, view definitions, and even indexed views). You might not replicate your whole database – just the tables and other objects of interest. These articles make up a publication. Replication is just about getting that stuff to the Subscribers.&lt;/p&gt;  &lt;p&gt;Of course, the magazine analogy breaks down quite quickly. Each time my son got a new edition, the articles were brand new – material he’d never seen before. In SQL Replication, the Subscribers probably have data from earlier. But this brings us to look at a key concept in SQL Replication – how the stupid thing starts.&lt;/p&gt;  &lt;p&gt;Regardless of what kind of replication you’re talking about, the concept is all about keeping Subscribers in sync with the Publisher. You could have the whole table move across every time, but more than likely, you’re going to just have the changes go through. At some point, though, the thing has to get to a starting point.&lt;/p&gt;  &lt;p&gt;This starting point is (typically) done using a &lt;strong&gt;snapshot&lt;/strong&gt;. It’s not a “Database Snapshot” like what you see in the Object Explorer of SQL Server Management Studio – this is just a starting point for replication. It’s a dump of all the data and metadata that make up the articles, and it’s stored on the file system. Not in a database, on the file system. A Subscriber will need this data to be initialised, ready for a stream of changes to be applied.&lt;/p&gt;  &lt;p&gt;It’s worth noting that there is a flavour of replication which just uses snapshots, known as &lt;strong&gt;Snapshot Replication&lt;/strong&gt;. Every time the subscriber gets a refresh of data, it’s the whole publication that has to flow down. This might be fine for small pieces of data, it might not for others.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;(There are other ways to get started too, such as by restoring a backup, but you should still be familiar with the concept of snapshots for replication.)&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;To get in sync, a subscriber would need the data in the snapshot for initialisation, and then every change that has happened since. To reduce the effort that would be required if something went drastically wrong and a new subscription became needed, snapshots can be recreated at regular intervals. This is done by the &lt;strong&gt;Snapshot Agent&lt;/strong&gt;, and like all agents, can be found as a SQL Server Agent job.&lt;/p&gt;  &lt;p&gt;The middle-man between the Publisher and the Subscribers is the &lt;strong&gt;Distributor&lt;/strong&gt;. The Distributor is essentially a bunch of configuration items (and as we’ll see later, changes to articles), stored in the &lt;strong&gt;distribution&lt;/strong&gt; database – a system database that is often overlooked. &lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:left;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" align="left" src="http://sqlblog.com/blogs/rob_farley/image_45B0A0CB.png" width="222" height="173" /&gt;If you query &lt;font face="Consolas"&gt;sys.databases&lt;/font&gt; on a SQL instance that has been configured as a Distributor you’ll see a row for the distribution database. It won’t have a database_id less than 5, but it will have a value of 1 in the &lt;font face="Consolas"&gt;is_distributor&lt;/font&gt; column. The instance used as the Distributor is the one whose SQL Server Agent runs most of the replication agents, including the Snapshot Agent.&lt;/p&gt;  &lt;p&gt;If you’re not doing Snapshot Replication, you’re going to want to get those changes through. &lt;strong&gt;Transactional Replication&lt;/strong&gt;, as the name suggests, involves getting transactions that affect the published articles out to the subscribers. If the replication has been set up to push the data through, this should be quite low latency.&lt;/p&gt;  &lt;p&gt;So that SQL Server isn’t having to check every transaction right in the middle of it, there’s a separate agent that looks though the log for transactions that are needed for the replication, copying them across to the distribution database, where they hang around as long as they’re needed. This agent is the &lt;strong&gt;Log Reader Agent&lt;/strong&gt;, and also runs on the Distributor. You can imagine that there is a potential performance hit if this is running on a different machine to the Publisher, and this is one of the influencing factors that means that you’ll typically have the Distributor running on the Publisher (although there are various reasons why you might not).&lt;/p&gt;  &lt;p&gt;Now we have a process which is making sure that initialisation is possible by getting snapshots ready, and another process which is looking for changes to the articles. The agent that gets this data out to Subscribers is the &lt;strong&gt;Distribution Agent&lt;/strong&gt;. Despite its name, it can run at the Subscriber, if the Subscriber is set to pull data across (good for occasionally connected systems). This is like with my magazine – I might prefer to go to the newsagent and pick it up, if I’m not likely to be home when the postman comes around. In effect, my role as Subscriber includes doing some distribution if I want to pull the data through myself.&lt;/p&gt;  &lt;p&gt;These three agents, Snapshot Agent, Log Reader Agent and Distribution Agent, make up the main agents used for Transactional Replication, which is probably the most common type of replication around. Snapshot Replication doesn’t use the Log Reader Agent, but still needs the other two.&lt;/p&gt;  &lt;p&gt;Now let’s consider the other types of replication.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" align="right" src="http://sqlblog.com/blogs/rob_farley/image_38D64AAD.png" width="296" height="216" /&gt;&lt;strong&gt;Merge Replication&lt;/strong&gt; involves having subscribers that can also change the data. It’s similar to Transactional Replication with Updateable Subscribers, which has been deprecated. These changes are sent back to the &lt;strong&gt;Merge Agent&lt;/strong&gt;, which works out what changes have to be applied. This is actually more complicated than you might expect, because it’s very possible to have changes made in multiple places and for a conflict to arise. You can set defaults about who wins, and can override manually through the Replication Monitor (which is generally a useful tool for seeing if Subscribers are sufficiently in sync, testing the latency, and so on). Updateable Subscribers end up using the &lt;strong&gt;Queue Reader Agent&lt;/strong&gt; instead of the Merge Agent. They’re slightly different in the way they run, but I consider them to be quite similar in function, as they both involve getting the data back into the publisher when changes have been made elsewhere.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Peer-to-Peer Replication&lt;/strong&gt; is the final kind. This is really a special type of Transactional Replication, in which you have multiple publishers, all pushing data out at each other. It’s the option that is considered closest to a High Availability system, and is good across geographically wide environments, particularly if connections are typically routed to the closest server. Consider the example of servers in the UK, the US and Australia. Australian users can be connected to the local server, knowing the changes are going to be pushed out to the UK and US boxes. They’re set up in a &lt;strong&gt;topology&lt;/strong&gt;, with each server considered a &lt;strong&gt;node&lt;/strong&gt;. Each server keeps track of which updates it’s had, which means they should be able to keep in sync, regardless of when they have downtime. If Australian changes are sent to the UK but not the US, then US can be updated by the UK server if that’s easier. &lt;/p&gt;  &lt;p&gt;Replication can feel complex. There are a lot of concepts that are quite alien to most database administrators. However, the benefits of replication can be significant, and are worth taking advantage of in many situations. They’re an excellent way of keeping data in sync across a number of servers, without many of the server availability hassles associated with log-shipping or mirroring. It can definitely help you achieve scale-out environments, particularly if you consider Peer-to-Peer, which can help you offload your connections to other servers, knowing the key data can be kept up-to-date easily.&lt;/p&gt;  &lt;p&gt;I haven’t tried to be completely comprehensive in this quick overview of replication, but if you’re new to the concepts, or you’re studying for one of the MCITP exams and need to be able to get enough of an understanding to get you by, then I hope this has helped demystify it somewhat.&lt;/p&gt;  &lt;p&gt;There’s more in SQL Books Online, of course – &lt;a target="_blank"&gt;a whole section on Replication&lt;/a&gt;. If what I’ve written makes sense, go exploring, and try to get it running on your own servers too. Maybe my next post will cover some of that.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>The MCM lab exam – two days later</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/12/22/the-mcm-lab-exam-two-days-later.aspx</link><pubDate>Sat, 22 Dec 2012 20:59:38 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46746</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/archive/2012/12/18/mcm-lab-exam-this-week.aspx" target="_blank"&gt;Readers of my blog&lt;/a&gt;, or &lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;followers on Twitter&lt;/a&gt; will know I took the &lt;a href="http://www.microsoft.com/learning/en/us/exam.aspx?id=88-971" target="_blank"&gt;MCM Lab exam&lt;/a&gt; a couple of days ago. I let people know I was doing the exam, rather than doing the ‘normal’ thing of doing it in secret and hoping no-one found out until a successful result had been published.&lt;/p&gt;  &lt;p&gt;Oh, and this post has been approved by the MCM program’s &lt;a href="http://blogs.msdn.com/b/bobtaylor/" target="_blank"&gt;boB Taylor&lt;/a&gt; (&lt;a href="http://twitter.com/sqlboBT" target="_blank"&gt;@sqlboBT&lt;/a&gt;) as not-breaking NDA. Nothing herein should be seen to imply that a particular topic is or isn’t in the exam.&lt;/p&gt;  &lt;p&gt;So how did I go? Well... I made a bunch of mistakes, I wasted a lot of time, I even left &lt;a href="http://blogs.technet.com/b/themasterblog/archive/2012/11/05/at-sql-pass-and-clarification-around-mcm-lab-exam.aspx" target="_blank"&gt;some questions incomplete&lt;/a&gt;, and so I assume I’ve failed. It’s a horrible feeling, I can tell you. I went in not knowing what to expect. I knew that I’d worked with SQL Server for a lot of years, and felt like I had a good grounding in all the various aspects of SQL Server that I might have to know – but on the day, in the stress of having a laggy remote desktop session, a Lync connection which had dropped out just beforehand, being told that ‘everyone’ fails first time but that I was probably going to be one of the exceptions..., well, I didn’t feel like it was my day.&lt;/p&gt;  &lt;p&gt;If I’d had that day at a client site, it would’ve been fine. With extra time, I have no doubt that I would’ve been able to get through things. I could’ve raised questions when I didn’t feel exactly sure about what was expected. I could’ve pointed out how much I’d done to help address a situation and found out if they wanted me to continue hunting for other potential issues. I could’ve asked for more information. I could’ve written a document describing what I had in mind to solve the particular problem to confirm that the client was happy with it. You don’t get any of that in exam situations.&lt;/p&gt;  &lt;p&gt;I found myself doing things the ways that work in the real world, but not having somewhere to turn when it turned out that the information was clearly not quite there, turning a simple situation into a trickier one. In real life, I’d’ve turned to the face peering over my shoulder and said “I thought you said it was that thing.” Instead, I wasted time looking for what it actually was.&lt;/p&gt;  &lt;p&gt;I found myself making changes to databases that I wouldn’t make in the real world without first running those changes past someone, or at least making sufficient disclaimers along the lines of “I would recommend making this change, but there’s always a chance that something else will see the impact of this…”&lt;/p&gt;  &lt;p&gt;Now, I think I could go into the exam, and even faced with a different set of questions, have a better picture about the style of thing that would be asked, and be better able to identify the danger-items – those things which could lure me away from staying on track. It’s not the same with consulting, because you can always ask as soon as you consider there’s a potential problem. Just like the mechanic who says “So, there’s a rattle… do you want me to spend some time looking into this? If so, is half an hour okay?”, I can ask my clients if there’s something which doesn’t smell right. In the exam, you don’t get that luxury.&lt;/p&gt;  &lt;p&gt;If you’re going to take the exam, I would recommend the following approach:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Start by going through the questions, making notes about what they’re asking you to do and how much time it’ll take. Move the list of questions to the side, because switching from one window to another will simply take too long.&lt;/li&gt;    &lt;li&gt;Once you’ve identified ones that you know will be ‘quick’, do them. But if one stops being quick, leave it and come back. You’re not in danger of leaving it unfinished completely – there’s 5.5 hours that you’ll be remoted in, and you’ll be coming back in about 30 minutes. But you don’t want to find that you spend an hour on something which you anticipated would be a ten minute job.&lt;/li&gt;    &lt;li&gt;Whenever you leave a question, put it back in the list at an appropriate spot. If you think it’s still quicker than the question on XYZ, then fine, put it above that. But be ruthless about how long you spend on each task. If something doesn’t work the way you expect, do some troubleshooting, but don’t treat it as if your client depends on it. Your exam result will suffer more because you wasted time, than if you left it incomplete. You could end up finding that the XYZ question actually turned out to be simpler than you thought.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I hope I passed, but I really don’t think I have done.&lt;/p&gt;  &lt;p&gt;I’m confident I blitzed quite a lot of them. There were plenty on the list that I moved through quickly, and others that took me longer than I expected, but I still finished. It’s just that there were a certain number that I assume I didn’t finish satisfactorily – simply because if it were my database that I had got a consultant in to fix, I wouldn’t’ve considered it complete. I mean, hopefully I hit the key things on some of those, but I can’t tell. &lt;/p&gt;  &lt;p&gt;I know I definitely got some just plain wrong. The things that weren’t working, and I didn’t have time to get back to. I was getting hungry by the end of it, and was feeling stressed about the amount of time I’d wasted on other questions.&lt;/p&gt;  &lt;p&gt;Assuming I need to retake it, I have to wait until 90 days after this attempt. That’s March 20th.&lt;/p&gt;  &lt;p&gt;So what will I do between now and then? Well, I might check through the various guides about the things that stop things from working the way I expect them to. I’m not saying what didn’t work, but imagine there’s some technology that you’re familiar enough with. To use &lt;a href="http://thomaslarock.com/2012/12/things-i-didnt-do-for-the-mcm-lab-exam/" target="_blank"&gt;Tom’s example&lt;/a&gt;, peer-to-peer replication. I know how to set that up – I’ve done it before. It’s actually very straight forward. But if you do what you always do and you get some error… well, that might be harder to troubleshoot. In the real world there are plenty of ways you can troubleshoot online, but in an exam, it’s just Books Online and what’s in your head.&lt;/p&gt;  &lt;p&gt;In that regard, Tom’s guide on &lt;a href="http://thomaslarock.com/2012/12/things-i-didnt-do-for-the-mcm-lab-exam/" target="_blank"&gt;what he didn’t do&lt;/a&gt; is useful. His “Don’t try to get every answer” is very important. But I would also point out that you should start to try them, in case they turn out to be easier than you expected. Just because it’s on P2P (continuing Tom’s example) doesn’t mean you won’t get it. Tom described shouting at the screen saying “Seriously? Is that all you’ve got?” – this is why you should try the ones that you figured would be harder. Don’t start with them, but a few hours in, don’t just focus on the time-wasters.&lt;/p&gt;  &lt;p&gt;Tom described ‘not studying alone’. I discussed various things with people leading up to the exam, more so than studying per se. There’s another person who I know is taking the exam soon, and we’ve talked a bit about the various technologies. Like “How are you at TDE? What’s your clustering like?” – that kind of thing. He has environments set up where he's practising a bunch of tasks. I didn’t. I did skim through some of the &lt;a href="http://technet.microsoft.com/en-us/sqlserver/ff977043.aspx" target="_blank"&gt;videos&lt;/a&gt;, but not many, and it was really only skimming. I found myself being slightly interested in the video where &lt;a href="http://technet.microsoft.com/sqlserver/gg545017.aspx" target="_blank"&gt;PaulR demonstrates how to set up replication with a mirrored subscriber&lt;/a&gt;. I watched it a couple of times because something wasn’t sitting right – I figured it out though… in the exam you don’t want to be using scripts for everything (the GUI would be much better – not that I practised at all), but also, the error that he gets at the end… it’s because his script tries to remove the subscriber from the wrong server. In the GUI, he wouldn’t’ve had that problem (reminds me – I should mention this particular mistake to him, although I’m sure he knows about it and just doesn’t have an easy way to change the video). I didn’t end up watching most of the videos – even the ones on Resource Governor. I looked at the titles, and figured it would be fine to skip them. I listened to one on Clustering Troubleshooting in the car as I drove home from town, but just kept thinking “Right – I get all that… but is that enough for the exam?” I can’t say whether it was or not, but I can tell you that I didn’t do any further study on it. I also watched the Waits demo, and the slides on Log File Internals (but mainly out of curiosity about a different problem I’d had a while back). &lt;/p&gt;  &lt;p&gt;I didn’t read the books. I’ve read a couple of chapters of &lt;a target="_blank"&gt;Kalen’s SQL Internals&lt;/a&gt; before, but didn’t finish it. I haven’t even read the book &lt;a href="http://www.amazon.com/Professional-Server-2012-Internals-Troubleshooting/dp/1118177657" target="_blank"&gt;I wrote recently&lt;/a&gt; (well, obviously the chapters I wrote – I read them lots of times).&lt;/p&gt;  &lt;p&gt;Of &lt;a href="http://thomaslarock.com/2012/12/how-i-passed-the-sql-server-2008-mcm-exams/" target="_blank"&gt;Tom’s advice on what he did to pass&lt;/a&gt; (doing the reading, watching the videos, writing things down and teaching others) I really didn’t do any of them. But I do spend time teaching others in general. I’m an MCT – I teach from time to time (not as often as I used to), and that sharing aspect is important to me. I explain to people about how to tune their queries. I explain why indexing strategies will work for general performance gain. I explain why security is important, and the keys to avoiding SQL Injection. Even today I spent time explaining the roles of Replication Agents to someone (who later told me that they’d heard that stuff before, but it just made more sense when I explained it). Teaching comes naturally to me, and I’ve always done it. But I didn’t do any of that with this MCM exam in mind.&lt;/p&gt;  &lt;p&gt;My advice on how to pass this exam – use SQL Server for fifteen years. It’s what I did. Studying might work for you too, and if that’s what you’ve done, then you may well get to leave the lab exam feeling a whole lot more confident about your result than me.&lt;/p&gt;  &lt;p&gt;I actually did leave feeling very confident about my result. I’m confident I failed. But I also know that I could pass it tomorrow with no extra study – by avoiding the time sinks that are in there – and I wish I didn’t have to wait until March to retake it. &lt;/p&gt;  &lt;p&gt;I’ll post again when my result becomes public, to let you know how I actually did. &lt;/p&gt;  &lt;p&gt;Of course, I hope I’m wrong.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;</description></item><item><title>MCM Lab exam this week</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/12/18/mcm-lab-exam-this-week.aspx</link><pubDate>Tue, 18 Dec 2012 07:55:56 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46677</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;In two days I’ll’ve finished the MCM Lab exam, 88-971. If you do an internet search for &lt;em&gt;88-971&lt;/em&gt;, it’ll tell you the answer is –883. Obviously.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_27E0F8E5.png" width="514" height="238" /&gt;&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_7C5C8C50.png" width="524" height="456" /&gt;&lt;/p&gt;  &lt;p&gt;It’ll also give you a link to the &lt;a href="http://www.microsoft.com/learning/en/us/Exam.aspx?ID=88-971&amp;amp;Locale=en-us" target="_blank"&gt;actual exam page&lt;/a&gt;, which is useful too, once you’ve finished being distracted by the calculator instead of going to the thing you’re actually looking for. (Do people actually search the internet for the results of mathematical questions? Really?)&lt;/p&gt;  &lt;p&gt;The list of Skills Measured for this exam is quite short, but can essentially be broken down into one word “Anything”.&lt;/p&gt;  &lt;p&gt;The Preparation Materials section is even better. Classroom Training – none available. Microsoft E-Learning – none available. Microsoft Press Books – none available. Practice Tests – none available. But there are links to &lt;a href="http://technet.microsoft.com/en-us/sqlserver/ff977043.aspx" target="_blank"&gt;Readiness Videos&lt;/a&gt; and a page which has no resources listed, but tells you a list of people who have already qualified. Three in Australia who have MCM SQL Server 2008 so far. The list doesn’t include some of the latest batch, such as &lt;a href="http://stratesql.com" target="_blank"&gt;Jason Strate&lt;/a&gt; or &lt;a href="http://thomaslarock.com/" target="_blank"&gt;Tom LaRock&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;I’ve used SQL Server for almost 15 years. During that time I’ve been awarded SQL Server MVP seven times, but the MVP award doesn’t actually mean all that much when considering this particular certification. I know lots of MVPs who have tried this particular exam and failed – including Jason and Tom. Right now, I have no idea whether I’ll pass or not. People tell me I’ll pass no problem, but I honestly have no idea. There’s something about that “Anything” aspect that worries me.&lt;/p&gt;  &lt;p&gt;I keep looking at the list of things in the Readiness Videos, and think to myself “I’m comfortable with Resource Governor (or whatever) – that should be fine.” Except that then I feel like I maybe don’t know all the different things that can go wrong with Resource Governor (or whatever), and I wonder what kind of situations I’ll be faced with. And then I find myself looking through the stuff that’s explained in the videos, and wondering what kinds of things I should know that I don’t, and then I get amazingly bored and frustrated (after all, I tell people that these exams aren’t supposed to be studied for – you’ve been studying for the last 15 years, right?), and I figure “What’s the worst that can happen? A fail?”&lt;/p&gt;  &lt;p&gt;I’m told that the exam provides a list of scenarios (maybe 14 of them?) and you have 5.5 hours to complete them. When I say “complete”, I mean complete – &lt;a href="http://blogs.technet.com/b/themasterblog/archive/2012/11/05/at-sql-pass-and-clarification-around-mcm-lab-exam.aspx" target="_blank"&gt;you don’t get to leave them unfinished&lt;/a&gt;, that’ll get you ‘nil points’ for that scenario. Apparently no-one gets to complete all of them.&lt;/p&gt;  &lt;p&gt;Now, I’m a consultant. I get called on to fix the problems that people have on their SQL boxes. Sometimes this involves fixing corruption. Sometimes it’s figuring out some performance problem. Sometimes it’s as straight forward as getting past a full transaction log; sometimes it’s as tricky as recovering a database that has lost its metadata, without backups. Most situations aren’t a problem, but I also have the confidence of being able to do internet searches to verify my maths (in case I forget it’s –883). In the exam, I’ll have maybe twenty minutes per scenario (but if I need longer, I’ll have to take longer – no point in stopping half way if it takes more than twenty minutes, unless I don’t see an end coming up), so I’ll have time constraints too. And of course, I won’t have any of my usual tools. I can’t take scripts in, I can’t take staff members. Hopefully I can use the coffee machine that will be in the room.&lt;/p&gt;  &lt;p&gt;I figure it’s going to feel like one of those days when I’ve gone into a client site, and found that the problems are way worse than I expected, and that the site is down, with people standing over me needing me to get things right first time...&lt;/p&gt;  &lt;p&gt;...so it should be fine, I’ve done that before. :)&lt;/p&gt;  &lt;p&gt;If I do fail, it won’t make me any less of a consultant. It won’t make me any less able to help all of my clients (including you if you &lt;a href="http://lobsterpot.com.au/contact" target="_blank"&gt;get in touch&lt;/a&gt; – hehe), it’ll just mean that the particular problem might’ve taken me more than the twenty minutes that the exam gave me.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/rob_farley" target="_blank"&gt;@rob_farley&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;PS: Apparently the done thing is to NOT advertise that you’re sitting the exam at a particular time, only that you’re expecting to take it at some point in the future. I think it’s akin to the idea of not telling people you’re pregnant for the first few months – it’s just in case the worst happens. Personally, I’m happy to tell you all that I’m going to take this exam the day after tomorrow (which is the 19th in the US, the 20th here). If I end up failing, you can all commiserate and tell me that I’m not actually as unqualified as I feel.&lt;/p&gt;</description></item><item><title>Joins in single-table queries</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/12/10/joins-in-single-table-queries.aspx</link><pubDate>Tue, 11 Dec 2012 00:05:17 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46601</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;Tables are only metadata. They don’t store data.&lt;/p&gt;  &lt;p&gt;I’ve written &lt;a href="http://sqlblog.com/blogs/rob_farley/archive/2010/09/14/table-no-such-thing.aspx" target="_blank"&gt;something about this&lt;/a&gt; before, but I want to take a viewpoint of this idea around the topic of joins, especially since it’s the topic for T-SQL Tuesday this month. Hosted this time by &lt;a href="http://sqlity.net/en/1175/t-sql-tuesday-37-invite-to-join-me-in-a-month-of-joins/" target="_blank"&gt;Sebastian Meine&lt;/a&gt; (&lt;a href="http://twitter.com/sqlity" target="_blank"&gt;@sqlity&lt;/a&gt;), who has a whole series on joins this month. Good for him – it’s a great topic. &lt;a href="http://sqlity.net/en/1175/t-sql-tuesday-37-invite-to-join-me-in-a-month-of-joins/" target="_blank"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top:0px;border-right:0px;padding-top:0px;" title="TSQL2sDay150x150" border="0" alt="TSQL2sDay150x150" align="right" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_1CEA56EB.jpg" width="170" height="170" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In that last post I discussed the fact that we write queries against tables, but that the engine turns it into a plan against indexes. My point wasn’t simply that a table is actually just a Clustered Index (or heap, which I consider just a special type of index), but that data access always happens against indexes – never tables – and we should be thinking about the indexes (specifically the non-clustered ones) when we write our queries.&lt;/p&gt;  &lt;p&gt;I described the scenario of looking up phone numbers, and how it never really occurs to us that there is a master list of phone numbers, because we think in terms of the useful non-clustered indexes that the phone companies provide us, but anyway – that’s not the point of this post.&lt;/p&gt;  &lt;p&gt;So a table is metadata. It stores information about the names of columns and their data types. Nullability, default values, constraints, triggers – these are all things that define the table, but the data isn’t stored in the table. The data that a table describes is stored in a heap or clustered index, but it goes further than this.&lt;/p&gt;  &lt;p&gt;All the useful data is going to live in non-clustered indexes. Remember this. It’s important. Stop thinking about tables, and start thinking about indexes.&lt;/p&gt;  &lt;p&gt;So let’s think about tables as indexes. This applies even in a world created by &lt;a href="http://www.lmgtfy.com/?q=someone+who's+bad+at+database+design" target="_blank"&gt;someone else&lt;/a&gt;, who doesn’t have the best indexes in mind for you.&lt;/p&gt;  &lt;p&gt;I’m sure you don’t need me to explain Covering Index bit – the fact that if you don’t have sufficient columns “included” in your index, your query plan will either have to do a Lookup, or else it’ll give up using your index and use one that does have everything it needs (even if that means scanning it). If you haven’t seen that before, drop me a line and I’ll run through it with you. Or go and read &lt;a href="http://msmvps.com/blogs/robfarley/archive/2008/10/12/doing-the-maths-to-understand-sql-optimiser-choices.aspx" target="_blank"&gt;a post I did a long while ago&lt;/a&gt; about the maths involved in that decision.&lt;/p&gt;  &lt;p&gt;So – what I’m going to tell you is that a Lookup is a join.&lt;/p&gt;  &lt;p&gt;When I run &lt;font face="Consolas"&gt;SELECT CustomerID FROM Sales.SalesOrderHeader WHERE SalesPersonID = 285; &lt;/font&gt;against the AdventureWorks2012 get the following plan:&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_31AC299E.png" width="470" height="226" /&gt;&lt;/p&gt;  &lt;p&gt;I’m sure you can see the join. Don’t look in the query, it’s not there. But you should be able to see the join in the plan. It’s an Inner Join, implemented by a Nested Loop. It’s pulling data in from the Index Seek, and joining that to the results of a Key Lookup.&lt;/p&gt;  &lt;p&gt;It clearly is – the QO wouldn’t call it that if it wasn’t really one. It behaves exactly like any other Nested Loop (Inner Join) operator, pulling rows from one side and putting a request in from the other. You wouldn’t have a problem accepting it as a join if the query were slightly different, such as &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT sod.OrderQty       &lt;br /&gt;FROM Sales.SalesOrderHeader AS soh        &lt;br /&gt;JOIN Sales.SalesOrderDetail as sod        &lt;br /&gt;on sod.SalesOrderID = soh.SalesOrderID        &lt;br /&gt;WHERE soh.SalesPersonID = 285;&lt;/font&gt;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_5CA417A3.png" width="478" height="223" /&gt;&lt;/p&gt;  &lt;p&gt;Amazingly similar, of course. This one is an explicit join, the first example was just as much a join, even thought you didn’t actually ask for one.&lt;/p&gt;  &lt;p&gt;You need to consider this when you’re thinking about your queries.&lt;/p&gt;  &lt;p&gt;But it gets more interesting.&lt;/p&gt;  &lt;p&gt;Consider this query: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT SalesOrderID       &lt;br /&gt;FROM Sales.SalesOrderHeader        &lt;br /&gt;WHERE SalesPersonID = 276        &lt;br /&gt;AND CustomerID = 29522;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;It doesn’t look like there’s a join here either, but look at the plan.&lt;/p&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_3993A633.png" width="513" height="240" /&gt;    &lt;br /&gt;&lt;/p&gt;  &lt;p&gt;That’s not some Lookup in action – that’s a proper Merge Join. The Query Optimizer has worked out that it can get the data it needs by looking in two separate indexes and then doing a Merge Join on the data that it gets. Both indexes used are ordered by the column that’s indexed (one on SalesPersonID, one on CustomerID), and then by the CIX key SalesOrderID. Just like when you seek in the phone book to Farley, the Farleys you have are ordered by FirstName, these seek operations return the data ordered by the next field. This order is SalesOrderID, even though you didn’t explicitly put that column in the index definition. The result is two datasets that are ordered by SalesOrderID, making them very mergeable. &lt;/p&gt;  &lt;p&gt;Another example is the simple query&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Consolas"&gt;SELECT CustomerID       &lt;br /&gt;FROM Sales.SalesOrderHeader        &lt;br /&gt;WHERE SalesPersonID = 276;&lt;/font&gt;      &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/rob_farley/image_0191B250.png" width="514" height="245" /&gt;&lt;/p&gt;  &lt;p&gt;This one prefers a Hash Match to a standard lookup even! This isn’t just ordinary index intersection, this is something else again! Just like before, we could imagine it better with two whole tables, but we shouldn’t try to distinguish between joining two tables and joining two indexes.&lt;/p&gt;  &lt;p&gt;The Query Optimizer can see (using basic maths) that it’s worth doing these particular operations using these two less-than-ideal indexes (because of course, the best indexese would be on both columns – a composite such as (SalesPersonID, CustomerID – and it would have the SalesOrderID column as part of it as the CIX key still).&lt;/p&gt;  &lt;p&gt;You need to think like this too.&lt;/p&gt;  &lt;p&gt;Not in terms of excusing single-column indexes like the ones in AdventureWorks2012, but in terms of having a picture about how you’d like your queries to run. If you start to think about what data you need, where it’s coming from, and how it’s going to be used, then you will almost certainly write better queries. &lt;/p&gt;  &lt;p&gt;…and yes, this would include when you’re dealing with regular joins across multiples, not just against joins within single table queries.&lt;/p&gt;</description></item><item><title>An MCM exam, Rob? Really?</title><link>http://sqlblog.com/blogs/rob_farley/archive/2012/11/18/an-mcm-exam-rob-really.aspx</link><pubDate>Mon, 19 Nov 2012 00:28:40 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46278</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;I took the &lt;a href="http://www.microsoft.com/learning/en/us/exam.aspx?id=88-970" target="_blank"&gt;SQL 2008 MCM Knowledge exam&lt;/a&gt; while in Seattle for the PASS Summit ten days ago.&lt;/p&gt;  &lt;p&gt;I wasn’t planning to do it, but I got persuaded to try. I was meaning to write this post to explain myself before the result came out, but it seems I didn’t get typing quickly enough.&lt;/p&gt;  &lt;p&gt;Those of you who know me will know I’m a big fan of certification, to a point. I’ve been involved with Microsoft Learning to help create exams. I’ve kept my certifications current since I first took an exam back in 1998, sitting many in beta, across quite a variety of topics. I’ve probably become quite good at them – I know I’ve definitely passed some that I really should’ve failed.&lt;/p&gt;  &lt;p&gt;I’ve also written that I don’t think exams are worth studying for.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;(That’s probably not entirely true, but it depends on your motivation. If you’re doing learning, I would encourage you to focus on what you need to know to do your job better. That will help you pass an exam – but the two skills are very different. I can coach someone on how to pass an exam, but that’s a different kind of teaching when compared to coaching someone about how to do a job. For example, the real world includes a lot of “it depends”, where you develop a feel for what the influencing factors might be. In an exam, its better to be able to know some of the “Don’t use this technology if XYZ is true” concepts better.)&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;As for the Microsoft Certified Master certification… I’m not opposed to the idea of having the MCM (or in the future, MCSM) cert. But the barrier to entry feels quite high for me. When it was first introduced, the nearest testing centres to me were in Kuala Lumpur and Manila. Now there’s one in Perth, but that’s still a big effort. I know there are options in the US – such as one about an hour’s drive away from downtown Seattle, but it all just seems too hard. Plus, these exams are more expensive, and all up – I wasn’t sure I wanted to try them, particularly with the fact that I don’t like to study.&lt;/p&gt;  &lt;p&gt;I used to study for exams. It would drive my wife crazy. I’d have some exam scheduled for some time in the future (like the time I had two booked for two consecutive days at TechEd Australia 2005), and I’d make sure I was ready. Every waking moment would be spent pouring over exam material, and it wasn’t healthy. I got shaken out of that, though, when I ended up taking &lt;a href="http://blogs.msdn.com/b/acoat/archive/2005/09/08/462194.aspx" target="_blank"&gt;four exams in those two days&lt;/a&gt; in 2005 and passed them all. I also worked out that if I had a Second Shot available, then failing wasn’t a bad thing at all. Even without Second Shot, I’m much more okay about failing. But even just trying an MCM exam is a big effort. I wouldn’t want to fail one of them.&lt;/p&gt;  &lt;p&gt;Plus there’s the illusion to maintain. People have told me for a long time that I should just take the MCM exams – that I’d pass no problem. I’ve never been so sure. It was almost becoming a pride-point. Perhaps I should fail just to demonstrate that I &lt;em&gt;can&lt;/em&gt; fail these things.&lt;/p&gt;  &lt;p&gt;Anyway – &lt;a href="http://blogs.msdn.com/b/bobtaylor/" target="_blank"&gt;boB Taylor&lt;/a&gt; (&lt;a href="http://twitter.com/sqlboBT" target="_blank"&gt;@sqlboBT&lt;/a&gt;) persuaded me to try the SQL 2008 MCM Knowledge exam at the PASS Summit. They set up a testing centre in one of the room there, so it wasn’t out of my way at all. I had to squeeze it in between other commitments, and I certainly didn’t have time to even see what was on the syllabus, let alone study. In fact, I was so exhausted from the week that I fell asleep at least once (just for a moment though) during the actual exam. Perhaps the questions need more jokes, I’m not sure. &lt;/p&gt;  &lt;p&gt;I knew if I failed, then I might disappoint some people, but that I wouldn’t’ve spent a great deal of effort in trying to pass. On the other hand, if I did pass I’d then be under pressure to investigate the &lt;a href="http://www.microsoft.com/learning/en/us/exam.aspx?id=88-971" target="_blank"&gt;MCM Lab exam&lt;/a&gt;, which can be taken remotely (therefore, a much smaller amount of effort to make happen). In some ways, passing could end up just putting a bunch more pressure on me.&lt;/p&gt;  &lt;p&gt;&lt;a href="https://www.facebook.com/robfarley/posts/10151201520814064" target="_blank"&gt;Oh, and I did&lt;/a&gt;.&lt;/p&gt;</description></item></channel></rss>