<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tag 'indexes'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=indexes&amp;orTags=0</link><description>Search results matching tag 'indexes'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>T-SQL Tuesday: What kind of Bookmark are you using?</title><link>http://sqlblog.com/blogs/kalen_delaney/archive/2013/06/10/tsql-tuesday-what-kind-of-bookmark-are-you-using.aspx</link><pubDate>Tue, 11 Jun 2013 03:21:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:49537</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/archive/2013/06/02/t-sql-tuesday-43-hello-operator.aspx"&gt;&lt;img width="150" height="150" title="TSQL2sDay150x150" align="right" alt="TSQL2sDay150x150" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_1B3B2D1E.jpg" border="0"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;I’m glad there is no minimum length requirement for &lt;a href="http://sqlblog.com/blogs/rob_farley/archive/2013/06/02/t-sql-tuesday-43-hello-operator.aspx"&gt;T-SQL Tuesday blog posts&lt;/a&gt;, because this one will be short. I was in the classroom for almost 11 hours today, and I need to be back tomorrow morning at 7:30. &lt;/p&gt;  &lt;p&gt;Way long ago, back in SQL 2000 (or was it earlier?) when a query indicated that SQL Server was going to use a nonclustered index to get row pointers, and then look up those rows in the underlying table, the plan just had a very linear look to it. The operator that indicated going from the nonclustered leaf to the data row was called a ‘Bookmark Lookup’, and it just looked a simple, single operator. Those of us that did troubleshooting of query plans knew that it could hide a multitude of sins, but to many people it looked very innocuous. &lt;/p&gt;  &lt;p&gt;Then in the next version, that simple, single Bookmark Lookup Operator was replaced by something that looked like a JOIN! When I first saw query plans showing a JOIN when doing a nonclustered index lookup, I was almost distraught, but it turns out that was only because it was new and different. The more I thought about it, the more I realized it was a Good Thing. &lt;/p&gt;  &lt;p&gt;Some people might get confused because they think of a JOIN as an operation that finds matches rows between two tables. But in fact, a JOIN can be used to find matches between any two sets of rows. And in the case of a nonclustered index lookup, SQL Server is finding rows in the leaf level of a nonclustered index (the ones that meet your filter condition(s) ) , and then is finding the matching rows in the underlying table. Internally, this is a JOIN operation.&lt;/p&gt;  &lt;p&gt;But when we look at the plans, there are two different operators used to show the actual lookup into the underlying table.&lt;/p&gt;  &lt;p&gt;There is a RID Lookup, used when the table is a heap. The nonclustered index contains Row ID, or RID, values (composed of a File ID, a Page ID and a Slot/Row number on the page). This RID is then used to ‘match’ with rows in the underlying table that have the same RID.&lt;/p&gt;  &lt;p&gt;There is also a KEY Lookup, used when the table has a clustered index.&amp;nbsp; The leaf level of a nonclustered index contains pointers that are the key values for the rows being pointed to. So to find the matching rows in the table, SQL Server takes the clustered key value from the nonclustered index, and then searches for that value in the clustered index, following the clustered index from the root down to its leaf.&lt;/p&gt;  &lt;p&gt;Below are the two icons used for these operations. If I were to give you a quiz, and ask which operator indicates we are finding a row in a table directly using a RID value, and which indicates we are finding a row in a table using a clustered index key, which would you say is which?&amp;nbsp; Take a moment to think about it. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_60814C43.png"&gt;&lt;img width="84" height="84" title="image" style="margin:0px;border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_6D7B2C54.png" border="0"&gt;&lt;/a&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_45D4B02A.png"&gt;&lt;img width="92" height="86" title="image" style="margin:0px;border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_4C1B86B8.png" border="0"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;I don’t know about you, but when I just try to figure out these two icons, I think the one on the left looks like it should be the KEY Lookup, and the one on the right should be the RID Lookup. &lt;/p&gt;  &lt;p&gt;But if you check &lt;a href="http://msdn.microsoft.com/en-us/library/ms191158.aspx"&gt;the page in Books Online&lt;/a&gt;, or if you check your own query plans, you see that they are backwards!&lt;/p&gt;  &lt;p&gt;Here is a plan showing a RID Lookup:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_0459FDD1.png"&gt;&lt;img width="244" height="112" title="image" style="margin:0px;border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_155E2BB4.png" border="0"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;nbsp;&lt;/p&gt;  &lt;p&gt;And here is a plan showing a KEY Lookup:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_69AD61B7.png"&gt;&lt;img width="244" height="116" title="image" style="margin:0px;border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_6F880550.png" border="0"&gt;&lt;/a&gt;&lt;/p&gt;        &lt;p&gt;Fortunately, the graphical query plan tells you what kind of operation it’s performing, so you can just ignore the picture and read the words. But what’s the point of a graphical plan in that case?&lt;/p&gt;  &lt;p&gt;So are they really backwards? Did someone just make a mistake and link in the wrong file when compiling the SQL Server graphical query plan code? Or am I misinterpreting this? &lt;/p&gt;  &lt;p&gt;I still think graphical query plans are one of the best thing ever added to product, so I’m really not complaining, but I’m just sayin….&lt;/p&gt;  &lt;p&gt;&amp;nbsp;&lt;/p&gt;  &lt;p&gt;Have fun, and Happy T-SQL Tuesday!&lt;/p&gt;  &lt;p&gt;&lt;font color="#ff00ff" size="4"&gt;~Kalen&lt;/font&gt;&lt;/p&gt;</description></item><item><title>File Layout Viewer vs. Drop Clustered Index</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/06/03/file-layout-viewer-vs-drop-clustered-index.aspx</link><pubDate>Mon, 03 Jun 2013 06:41:53 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:49327</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;I had a very educational exchange the other day on Twitter (also known as “getting schooled”) by my SQL compatriots Argenis Fernandez (@DBArgenis) and Hugo Kornelis (@Hugo_Kornelis). A guy called Chris Switalski (@crswit) asked a really interesting question on #sqlhelp, which I proceeded to answer incorrectly, which led to a correction by my friends, which in turn took me back into the SQL File Layout Viewer, which quickly uncovered a bug there.&lt;/p&gt;  &lt;p&gt;So, learn from your friends on Twitter!&lt;/p&gt;  &lt;p&gt;Anyway, here’s the question, and corrected demo using an updated SQL File Layout Viewer build.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Question (paraphrasing): &lt;/strong&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;When you create a clustered index on a heap, the data from the heap is rewritten and ordered, and the upper levels of the index are added. If you &lt;strong&gt;drop&lt;/strong&gt; that clustered index, what happens? Does the set of leaf-level pages stay in place, and the index pages just get unlinked and deallocated? Or is the data rewritten again? Does it stay in the same logical or physical order, even though the ordering requirement is removed?&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I had said that I imagine the data is rewritten in both cases, but that we could look and see to be sure. I was totally wrong – the leaf level of a clustered index is made of data pages, and a heap is made of data pages too, as Argenis pointed out. It turns out that, since they are not different structures down at that level, in order to drop a clustered index, the server will indeed just remove the upper levels of the index and leave all the leaf level data pages in place. Cool optimization! It turns out that my original release of the File Layout Viewer doesn’t always display this situation correctly, so I had to make some changes. This demo, though, I believe shows the behavior correctly.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Answer:&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;First, we make a test database and create a heap, and populate it with some dummy data:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;USE master
IF &lt;/span&gt;&lt;span style="color:gray;"&gt;EXISTS ( &lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;1 &lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:green;"&gt;sys&lt;/span&gt;&lt;span style="color:gray;"&gt;.&lt;/span&gt;&lt;span style="color:green;"&gt;databases &lt;/span&gt;&lt;span style="color:blue;"&gt;WHERE &lt;/span&gt;name &lt;span style="color:gray;"&gt;= &lt;/span&gt;&lt;span style="color:red;"&gt;'HeapTest' &lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;DROP DATABASE &lt;/span&gt;HeapTest&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;CREATE DATABASE &lt;/span&gt;HeapTest&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO

ALTER DATABASE &lt;/span&gt;HeapTest &lt;span style="color:blue;"&gt;SET RECOVERY SIMPLE&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
ALTER DATABASE &lt;/span&gt;HeapTest &lt;span style="color:blue;"&gt;MODIFY FILE &lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;NAME &lt;span style="color:gray;"&gt;= &lt;/span&gt;&lt;span style="color:red;"&gt;N'HeapTest'&lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;SIZE &lt;span style="color:gray;"&gt;= &lt;/span&gt;102400KB &lt;span style="color:gray;"&gt;);
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
ALTER DATABASE &lt;/span&gt;HeapTest &lt;span style="color:blue;"&gt;MODIFY FILE &lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;NAME &lt;span style="color:gray;"&gt;= &lt;/span&gt;&lt;span style="color:red;"&gt;N'HeapTest_log'&lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;SIZE &lt;span style="color:gray;"&gt;= &lt;/span&gt;10240KB &lt;span style="color:gray;"&gt;);
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO

USE &lt;/span&gt;HeapTest&lt;span style="color:gray;"&gt;;

&lt;/span&gt;&lt;span style="color:blue;"&gt;CREATE TABLE &lt;/span&gt;SampleData &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;Data &lt;span style="color:blue;"&gt;uniqueidentifier DEFAULT &lt;/span&gt;&lt;span style="color:magenta;"&gt;NEWID&lt;/span&gt;&lt;span style="color:gray;"&gt;(), 
    &lt;/span&gt;Padding &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;100&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;DEFAULT &lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;&lt;span style="color:magenta;"&gt;CAST&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;&lt;span style="color:magenta;"&gt;CHECKSUM&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;&lt;span style="color:magenta;"&gt;NEWID&lt;/span&gt;&lt;span style="color:gray;"&gt;() ) &lt;/span&gt;&lt;span style="color:blue;"&gt;as CHAR&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;100&lt;span style="color:gray;"&gt;) ) ) 
);
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO

INSERT &lt;/span&gt;SampleData &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;100000&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;If we run the File Layout Viewer, we see a nice, contiguous set of data pages (type 1) in teal:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust01_7E43BA85.jpg"&gt;&lt;img title="FLVDropClust01" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FLVDropClust01" src="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust01_thumb_158EBEF7.jpg" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we’ll add a NON clustered index, just so we can track what happens to that structure too:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;CREATE NONCLUSTERED INDEX &lt;/span&gt;Nonclust &lt;span style="color:blue;"&gt;ON &lt;/span&gt;SampleData &lt;span style="color:gray;"&gt;( &lt;/span&gt;Padding &lt;span style="color:gray;"&gt;);
&lt;/span&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shows up in the next part of the data file, in bright blue:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust02_65F0A06A.jpg"&gt;&lt;img title="FLVDropClust02" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FLVDropClust02" src="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust02_thumb_4490FACE.jpg" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now if we make a clustered index on the heap, it will have to be rewritten into another region of the file, so that it can be stored in the right logical order. We should expect it to “move” in the visualization down into the empty part of the file. The non clustered index will &lt;em&gt;also&lt;/em&gt; be rewritten, because it will switch from using physical row locations to point to data in the heap to using the clustering key to point to data in the new clustered index. Let’s see what it looks like:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;CREATE CLUSTERED INDEX &lt;/span&gt;Clust &lt;span style="color:blue;"&gt;ON &lt;/span&gt;SampleData &lt;span style="color:gray;"&gt;( &lt;/span&gt;Data &lt;span style="color:gray;"&gt;);
&lt;/span&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;The new clustered index and new version of the NON clustered index are both created in another region of the data file:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust03_583DE462.jpg"&gt;&lt;img title="FLVDropClust03" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FLVDropClust03" src="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust03_thumb_289FC5D6.jpg" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see, in lighter gray, the regions that were emptied where the two structures had been stored, and the two new structures now in the center of the graphic.&lt;/p&gt;

&lt;p&gt;Now for the part of the question I got wrong: what happens when we drop that clustered index?&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;DROP INDEX &lt;/span&gt;SampleData&lt;span style="color:gray;"&gt;.&lt;/span&gt;Clust&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is really neat: the leaf level of the clustered index (remember data pages are data pages) stays right where it is, and the upper level index pages are just deallocated/unlinked. &lt;/p&gt;

&lt;p&gt;The NON clustered index does have to be rebuilt yet again to change those clustered index keys out for row location pointers that point to the rows in the heap, so those pages do get rewritten into yet another region of the file:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust04_6E444FF4.jpg"&gt;&lt;img title="FLVDropClust04" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FLVDropClust04" src="http://sqlblog.com/blogs/merrill_aldrich/FLVDropClust04_thumb_45C56DE0.jpg" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So, to summarize:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you add or remove a clustered index from a table, any NON clustered indexes always have to be rebuilt - and will be automatically - to exchange row pointers for clustering keys in the non clustered index itself. This generally takes care of itself, but it can be a performance issue, or cause longer running transactions or a large amount of logging.&lt;/p&gt;

&lt;p&gt;When you CREATE a clustered index on a heap, the data has to be rewritten to enforce the logical order of the index.&lt;/p&gt;

&lt;p&gt;When you DROP a clustered index, however, the server can keep all the leaf level pages in place and just strip away the index levels above.&lt;/p&gt;</description></item><item><title>March Update to Rules-Driven Maintenance</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/03/11/march-update-to-rules-driven-maintenance.aspx</link><pubDate>Tue, 12 Mar 2013 02:56:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48191</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;This month I have a minor update to the Rules-Driven Maintenance code I originally &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;posted&lt;/a&gt; back in August 2012. This update has just two enhancements, but they are nice ones, I think:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Much improved handling for instances that use database snapshots.&lt;/li&gt;    &lt;li&gt;Ability to do intra-day differential backups purely by setting maintenance times and limits in the policy table.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;The code posted here is cumulative, and replaces entirely the original code, but please refer back to &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;that original blog post&lt;/a&gt; for description, instructions and details. This update can be installed right over an existing deployment, or be installed all by itself as a complete solution but – as always – please test and be cautious.&lt;/p&gt;  &lt;p&gt;Enjoy!&lt;/p&gt;</description></item><item><title>Public Release, SQL Server File Layout Viewer</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/03/01/public-release-sql-server-file-layout-viewer.aspx</link><pubDate>Fri, 01 Mar 2013 21:36:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47991</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;h2&gt;Version 1.0 is Now Available!&lt;/h2&gt;  &lt;p&gt;I’ve been working off and on, as my real job permits, on this visualization tool for SQL Server data files. This is an educational or exploratory tool where you can more readily &lt;i&gt;see&lt;/i&gt; how the individual data pages in MDF/NDF files are organized, where your tables and indexes live, what effect operations like index rebuild or index reorganize have on the physical layout of the data pages.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewerR1_6399E49C.png"&gt;&lt;img title="FileLayoutViewerR1" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewerR1" width="1028" height="494" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewerR1_thumb_228B6538.png"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The viewer will scan a whole database, using only SQL and DBCC commands, and will render a color-coded representation of all the data pages represented in colored bands. Each partition of each index or heap in the database is assigned a color, so that you can see where all the bits and pieces of an object are located in the files. Above the colored bands there are grayscale or white pixels that show the page type in SQL Server (most are white, which are data pages. Unused/empty regions of the file show as gray). In the image above, for example, all the bright green areas are one index, all the purple areas are one index, and so on.&lt;/p&gt;  &lt;p&gt;There is mouse-over functionality. If you move the mouse cursor over the graph, then details about each page populate the text fields at right, including the object and index the page belongs to, the page type, whether the page represents a fragment, where the previous and next pages are for the same object, etc.&lt;/p&gt;  &lt;h2&gt;Why?&lt;/h2&gt;  &lt;p&gt;Why create something like this? I am a visual person, and I have a theory that many issues we have in computing come down to not being able to see what’s going on. This is especially true as we learn about unfamiliar technology – we have to develop a mental model of structures like B-trees or linked lists or files in order to understand what’s happening. I hope this tool, combined with other knowledge, will help people form an accurate understanding of how data file internals work in SQL Server, faster than working purely in the abstract with tools like DBCC Page or DBCC Ind.&lt;/p&gt;  &lt;h2&gt;Instructions&lt;/h2&gt;  &lt;ol&gt;   &lt;li&gt;Download the tool and unzip it. The package includes both an executable and the source code. If you don’t want the source, the .exe file is a standalone program and will run all on its own, so you are welcome to discard the source folder.&lt;/li&gt;    &lt;li&gt;Validate you have the required prerequisites from the Prereq’s section below.&lt;/li&gt;    &lt;li&gt;Locate a non-production/test database to analyze. The database can be local or on a remote server. I suggest something of a reasonable size, because scanning a really huge data set can take quite a long time.&lt;/li&gt;    &lt;li&gt;Run SQLFileLayoutViewer.exe and select a database to scan. If the database is on a remote server, type the SQL Server name/instance name into the dialog.&lt;/li&gt;    &lt;li&gt;Click Analyze.&lt;/li&gt;    &lt;li&gt;Examine the resulting graph, and mouse over it with the cursor to view detailed information about each page.&lt;/li&gt; &lt;/ol&gt;  &lt;h2&gt;Disclaimer&lt;/h2&gt;  &lt;p&gt;This is a freeware tool provided for your fun, education and entertainment. However, there is no warranty of any kind and you use it at your sole risk. The tool is free but offered under the GNU General Public License 3. If successful, and people are interested, I’ll move this work to some sort of open source project.&lt;/p&gt;  &lt;h2&gt;Prerequisites&lt;/h2&gt;  &lt;p&gt;The app requires .NET Framework 4.0 and the SQL Server management tools. I’ve tested it on Windows 7, Windows Server 2008 R2 and Windows 8. It can be run against a database on a local or remote SQL instance. I believe it will work on any database in SQL Server 2005 or later, but have not tested every possible scenario.&lt;/p&gt;  &lt;h2&gt;Risks?&lt;/h2&gt;  &lt;p&gt;I believe this tool to be relatively risk free, but I would avoid running it against live production data. The tool’s data collection is simple: it will issue a few system table selects to get things like object names, and then it will execute a DBCC PAGE statement against every page in the database. All other processing after that is done locally in the application itself. It does not modify the database.&lt;/p&gt;  &lt;h2&gt;Bugs?&lt;/h2&gt;  &lt;p&gt;I would love to hear about bugs you come across, or additional features you think would be valuable. Please contact me through this site. Note that I am a DBA first, and an amateur .NET developer a distant second, so please be gentle.&lt;/p&gt;  &lt;p&gt;Enjoy!&lt;/p&gt;</description></item><item><title>Geek City: Join With Me!</title><link>http://sqlblog.com/blogs/kalen_delaney/archive/2013/02/14/index-intersection.aspx</link><pubDate>Thu, 14 Feb 2013 22:16:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47727</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;p&gt;I remember one of the most surprising changes in SQL Server &lt;strike&gt;2000&lt;/strike&gt; 2005 was how the graphical plans showed the use of a nonclustered index to seek, and the plan included something that looked like a JOIN to find rows in the base table.&amp;nbsp; Here’s an example. Although I used SQL Server 2008R2, the graphical plan will be pretty similar to what it looked like back in SQL Server &lt;strike&gt;2000&lt;/strike&gt; 2005. My code will make a copy of a table in the &lt;em&gt;AdventureWorks2008&lt;/em&gt; database, and then build an index on one of the columns.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;USE AdventureWorks2008;        &lt;br&gt;GO&lt;br&gt;IF object_id('dbo.Sales') IS NOT NULL         &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DROP TABLE dbo.Sales;         &lt;br&gt;GO         &lt;br&gt;SELECT * INTO dbo.Sales FROM Sales.SalesOrderHeader;         &lt;br&gt;GO         &lt;br&gt;CREATE INDEX Sales_SalesPersonID_index on dbo.Sales(SalesPersonID);         &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Now look at the estimated graphical execution plan for this query, that searches for a particular value for the &lt;em&gt;SalesPersonID&lt;/em&gt; column:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE SalesPersonID = 280;         &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;You should see something like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_0F9109CC.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_47634DEF.png" width="507" height="260"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;It shows a JOIN even though there is only one table involved! What’s with that? It took me a few minutes to figure out what I was seeing, and I must admit that over the years since then I have come to appreciate this graphical representation. You’ll have a problem if you think of a JOIN as only finding matching rows between two tables, but can be solved if you realize that a JOIN can be used when finding matches between any two ROWSETS, i.e. any two sets of rows. There is a set of rows in the nonclustered index on &lt;em&gt;SalesPersonID&lt;/em&gt;, and a set of rows in the table. The index seek finds all the index rows with a &lt;em&gt;SalesPersonID&lt;/em&gt; value of 280, but those index rows only contain an index key value and a row pointer (RowID or RID). Because the query is requesting all the columns in the table, not just the &lt;em&gt;SalesPersonID&lt;/em&gt; value,&amp;nbsp; SQL Server must find the rows in the&lt;em&gt; dbo.Sales&lt;/em&gt; table that have a matching value for the RowID. Internally, it is doing a join, even though there is only one table. It is joining index rows with table rows.&lt;/p&gt;  &lt;p&gt;If the table had a clustered index, the plan would look almost exactly the same, with one difference. In SQL Server, if a table has a clustered index, nonclustered index rows do not contain RowIDs, they contain the clustered key value to be used a pointer to the row. SQL Server can take this clustered key from the nonclustered index row and look up in the clustered index to find the desired row. So I create a clustered index on &lt;em&gt;SalesOrderNumber&lt;/em&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;CREATE UNIQUE CLUSTERED INDEX Sales_ident on dbo.Sales(SalesOrderNumber);        &lt;br&gt;GO&lt;/font&gt;       &lt;br&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Now when I run the same query as above, I get the following plan:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_6D58FE45.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_7EC95F1D.png" width="513" height="237"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The only difference in the two plans above is the icon for the lookup into the base table.&amp;nbsp; One is looking up into a heap using a RowID, which is an address containing the File ID, the page number and the slot or row number on the page. The other icon represents taking a key value from the nonclustered index row and looking it up by seeking through the clustered index. If you look again at these two icons, which one looks like a picture of directly addressing a row, and which looks like a picture of following another index into the table itself?&lt;/p&gt;    &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_7DF0F933.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_4B20F2BF.png" width="220" height="88"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The icon on the left is used for a RID lookup, and the icon on the right is used for a clustered index key lookup. I know that nobody asked me, but I think they got these backwards.&lt;/p&gt;  &lt;p&gt;But wait, there’s more… &lt;/p&gt;  &lt;p&gt;Prior to SQL Server 7, there was basically a rule of thumb with query plans that SQL Server would use one index (at most) per table per query. It could never (well, hardly ever) use two indexes on the same table in a single query. That changed in SQL Server 7 when a new algorithm called Index Intersection was introduced. &lt;/p&gt;  &lt;p&gt;I’m going to build another index on &lt;em&gt;SalesOrderDate&lt;/em&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;CREATE INDEX Sales_OrderDate_index on dbo.Sales(OrderDate);        &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This index will be used if I look for all orders place on July 1, 2002. The query plan for the following query is shown:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;     &lt;br&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE OrderDate = '2002-07-01';&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_4A488CD5.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_09A64066.png" width="491" height="238"&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;We saw that the index on &lt;em&gt;SalesPersonID&lt;/em&gt; is useful when looking for the value 280, and the index on &lt;em&gt;SalesOrderDate&lt;/em&gt; is useful when looking for '2002-07-01'. But what if one query looks for rows that satisfy both conditions? &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales       &lt;br&gt;WHERE OrderDate = '2002-07-01'         &lt;br&gt;AND SalesPersonID = 280&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Here’s the plan:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_68B2CDBE.png"&gt;&lt;img style="border-width:0px;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_07F57492.png" width="573" height="239"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;We see two JOINs ! First, SQL Server finds the sets of index rows meeting each of the WHERE filter conditions and those two index row sets are joined together, using the clustered key value as the join column. Then only after SQL Server finds rows that meets both conditions, it takes the common clustered index key and uses it to find the row in the underlying table. &lt;/p&gt;  &lt;p&gt;But wait, there’s more!&lt;/p&gt;  &lt;p&gt;Both of these queries use a table scan, because the filter conditions are not selective enough. There are way more rows with a &lt;em&gt;SalesPersonID&lt;/em&gt; value of 289 than there were for a value of 280, and there are more rows with an &lt;em&gt;OrderDate&lt;/em&gt; of March 1, 2004 than there are with an &lt;em&gt;OrderDate&lt;/em&gt; of July 1, 2002.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE OrderDate = '2004-03-01' ;        &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE SalesPersonID = 289;         &lt;br&gt;GO&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;    &lt;p&gt;Both of the above queries show the same plan. Because the table now has a clustered index, the plan shows a clustered index scan, but it is exactly the same work as a table scan:&lt;/p&gt;    &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_18F9A275.png"&gt;&lt;img style="border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_787262C2.png" width="368" height="98"&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;However, if you look at the plan for a query that uses both filter conditions, we don’t get a clustered index scan.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;SELECT * FROM dbo.Sales        &lt;br&gt;WHERE OrderDate = '2004-03-01'         &lt;br&gt;AND SalesPersonID = 289;&lt;/font&gt; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/kalen_delaney/image_65BD690B.png"&gt;&lt;img style="border:0px currentColor;padding-top:0px;padding-right:0px;padding-left:0px;display:inline;background-image:none;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_44C9F664.png" width="568" height="243"&gt;&lt;/a&gt;&lt;/p&gt;      &lt;p&gt;Even though each filter alone is not very restrictive, they can both be used for index intersection. The reason SQL Server usually doesn’t choose index seeks on non-restrictive filters is because the cost of then accessing the base table for all the qualifying rows can be very high. But even though each individual index returns a lot of rows, the optimizer’s estimate of the number of rows that meet BOTH conditions is very small, and therefore the number of lookups into the base table is quite small and affordable. &lt;/p&gt;  &lt;p&gt;So, what can you learn? Just because you think an column might not be selective enough to build a nonclustered index on, don’t discount it.&amp;nbsp; If that column will frequently be used in queries with multiple conditions, it just might turn out to be more useful than you thought! &lt;/p&gt;  &lt;p&gt;So make sure you test many possible choices of indexes and combinations of indexes before you know what the best choices are.&lt;/p&gt;  &lt;p&gt;Have fun!&lt;/p&gt;  &lt;p&gt;&lt;font color="#8064a2" size="4"&gt;&lt;strong&gt;~Kalen&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout III</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/29/visualizing-data-file-layout-iii.aspx</link><pubDate>Tue, 29 Jan 2013 05:45:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47372</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;This is part three of a blog series illustrating a method to render the file structure of a SQL Server database into a graphic visualization.&lt;/p&gt;  &lt;p&gt;Previous Installments:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx"&gt;Part 1&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/23/visualizing-data-file-layout-ii.aspx"&gt;Part 2&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Those that have been reading this series might be be thinking, “Is he going to go there?” Well, the answer is “Yes.” This is the &lt;strong&gt;GUID clustered index post&lt;/strong&gt; that had to be. It’s inevitable with this tool.&lt;/p&gt;  &lt;p&gt;If you follow SQL Server at all, you are probably aware of the &lt;a href="http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/"&gt;long-standing&lt;/a&gt; &lt;a href="http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html"&gt;debate&lt;/a&gt; about whether it is wise, desirable, smart, useful, or what have you, to identify rows using GUIDs. I won’t take a position on that, but I will show here, I hope objectively, a few things that the visualizer shows about file layout vs. distributed inserts, distributed inserts being one of the main challenges around using GUIDs as clustering keys. Just to recap the argument very, very briefly:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;GUID keys can be generated at the client, which saves a round-trip to the database server to create a collection of related rows.&lt;/p&gt;  &lt;p&gt;GUID keys can make certain architectures like sharding, or peer to peer replication, or merging multiple source databases, simpler.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;GUID keys are wider, therefore they take more space in memory and on disk. The additional space is multiplied by their presence in both clustered and non-clustered indexes if they are a clustering key.&lt;/p&gt;  &lt;p&gt;GUID keys don’t only take more space in RAM and on disk because of their width. They also cause &lt;em&gt;distributed inserts&lt;/em&gt; into the clustered index – that is, new rows are added to any and all pages in the index. Each time a row has to be added, the target page must be read into memory, and at a checkpoint, the &lt;em&gt;whole&lt;/em&gt; changed page (both existing and new rows) must be written to disk. This has two effects: &lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The amount of RAM and disk IO required for inserts is probably much higher, as pages with &lt;em&gt;existing&lt;/em&gt; data must come into cache, get changed, and then be written back out again. Essentially, large parts of the table have to be &lt;em&gt;rewritten&lt;/em&gt; to disk to append rows to pages that have data already.&lt;/li&gt;    &lt;li&gt;The pages that store the index will individually fill up, and have to split such that half the existing rows are written back out to the “old” page and half written out to a “new” page in a different location on disk. This causes the pages to be less full, the same number of rows to require more space on disk and in RAM, and the resulting index to be massively fragmented on disk.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;I am not writing to argue these points, which have I think been established by both sides of the debate, only to see if the visualizer shows these effects clearly. Most of the argument isn’t actually about these facts (they are all true, as far as I know) but rather which are more important, and I think that is the main source of debate on the issue.&lt;/p&gt;  &lt;h2&gt;Visual Example of Distributed Inserts&lt;/h2&gt;  &lt;p&gt;It’s very easy to create an example of this with a small sample database. I created one called “VizDemo2.” VizDemo2 has a slightly modified structure to illustrate what’s going on here – I need two tables that are stored separately on disk, so that they cannot interfere with one another. The simplest way to do that is with a couple of file groups containing one file each. So here’s the structure:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;I created the database with a 50MB, single file, Primary file group&lt;/li&gt;    &lt;li&gt;I added a file group FG1 with one 75MB file&lt;/li&gt;    &lt;li&gt;I added a second file group FG2 with one 75MB file&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;When the database is empty, the visualizer shows only the system pages at the start of each file, as shown here:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_01_5BB678D6.png"&gt;&lt;img title="VizDemo2_01" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_01" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_01_thumb_18DA345C.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;To that database I added two sample tables identical in structure but with different clustering keys:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;USE &lt;/span&gt;VizDemo2
&lt;span style="color:blue;"&gt;GO

CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersInt  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;int identity&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;1&lt;span style="color:gray;"&gt;, &lt;/span&gt;1 &lt;span style="color:gray;"&gt;) NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;[FG1]&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;span style="color:blue;"&gt;
CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;uniqueidentifier &lt;/span&gt;&lt;span style="color:gray;"&gt;NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED DEFAULT &lt;/span&gt;&lt;span style="color:magenta;"&gt;NEWID&lt;/span&gt;&lt;span style="color:gray;"&gt;(), 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;)&lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;[FG2]&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;I’ll populate the two tables and we can see what the file layout looks like afterward:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersInt &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;20000&lt;/pre&gt;


&lt;h2&gt;Compare&lt;/h2&gt;

&lt;p&gt;After inserts, the resulting graphic does show some facts we know to be true:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_02_6193AD20.png"&gt;&lt;img title="VizDemo2_02" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_02" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_02_thumb_7A25DB21.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, the data in the integer-clustered index takes about eight bands of the diagram, while storing the same data in a GUID clustered index has required about twelve bands of data pages. The database itself supports that impression with space allocation – it reports these figures:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_03_50B199E1.png"&gt;&lt;img title="VizDemo2_03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_03" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_03_thumb_49262A74.png" width="737" height="75" /&gt;&lt;/a&gt;&lt;/p&gt;





&lt;p&gt;Part of the extra space required is the width of the key, but part of it is the empty space on each page resulting from page splits. If a page that needs a new row is too full, then half the rows from that page are moved to a net-new page, half left in place, and the new row added to one or the other of the resulting pages. Afterward, they are often both partly empty.&lt;/p&gt;

&lt;p&gt;Second, the whole graphic in the GUID clustered index area is a dark blue that the visualizer uses to show fragmentation – in fact, the object is almost perfectly fragmented, with practically no contiguous pages at all. The sequence of pages in the leaf level of the index is still a linked list, as always, but it it is physically stored in essentially random order on disk.&lt;/p&gt;

&lt;h2&gt;Does Re-Indexing Help?&lt;/h2&gt;

&lt;p&gt;The next question is whether we can combat these problems by doing huge amounts of index maintenance – if we rewrite the GUID index, will that make it take less space, or make it more efficient? The answer is, “well, sort of, temporarily.”&lt;/p&gt;

&lt;p&gt;First, re-indexing will put the table in “GUID” order. Whether that really helps or not is debatable, perhaps. It would enable read-ahead for the index, which is otherwise clobbered by the fragmentation. Having the table in “GUID” order might or might not be of any help to performance. Second, re-indexing will make the pages denser, or less dense, depending on the fill factor applied. For the sake of demonstration, let’s re-index with the default fill factor, because I think that happens a lot out in the world, and it may tell us something:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;After re-indexing, this is a view just of the second file group with the GUID clustered table (note that I scrolled down in the display):&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_04_3266E2EB.png"&gt;&lt;img title="VizDemo2_04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_04" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_04_thumb_1B3B686D.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The arrow shows where the data was moved from the old data pages into a new region of the file. And, sure enough, it’s not fragmented (note the lighter color) and it takes less space in the file.&lt;/p&gt;

&lt;p&gt;That might sound good, but if this is a real database, inserts probably will continue. In the int clustered case, as we know, new data will be appended to the end of the page sequence, but in this case, new data will have to be inserted into most of the existing pages on disk. Those are all full now, and will have to be split 50/50 to create new pages for the new data, both the old and new pages will have to be written out, and the new pages by definition can’t be in index order with the existing pages.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;20000&lt;/pre&gt;


&lt;p&gt;What we get after more rows are added to the table is what a layperson might call a “hot mess:”&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_05_1D0BBE34.png"&gt;&lt;img title="VizDemo2_05" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_05" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_05_thumb_7A4A8676.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here everything is fragmented – back to that dark blue – even the pages we just re-indexed a moment ago, because they all split. The table has &lt;em&gt;more than&lt;/em&gt; doubled in size, even though we just doubled the number of rows, because the individual pages contain less data.&lt;/p&gt;

&lt;p&gt;Would appropriate fill factor be a workaround? In some measure, yes, but it really only combats the issue. The write activity on the table, even with a low fill factor, will still be higher as more existing pages have to be flushed at checkpoints. The pages will still be less dense, and therefore take up more space on disk and in cache. In short – maybe helpful but no silver bullet.&lt;/p&gt;

&lt;p&gt;What about Sequential GUIDs? Here I will venture my opinion. Sequential GUIDs have never made sense to me. They solve one part of this problem – the distributed insert part – but &lt;em&gt;at the expense of the very things GUIDs might be good for&lt;/em&gt;, namely not demanding a visit to the database to generate an identifier. If you have to come to the database, you already lost this whole argument. Use an integer and solve the rest of the problem at the same time. I can only see it as a sort of band-aid for existing systems that could not be refactored, but, like a bad SUV that combines the worst properties of a car and a truck, it feels like a really poor compromise to me.&lt;/p&gt;

&lt;p&gt;I hope this helps to illustrate some of the physical database design challenges that surround the use of GUID cluster keys. In the next installment I’m planning to demonstrate the interleaving of objects, which is one argument for multiple file groups.&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout II</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/23/visualizing-data-file-layout-ii.aspx</link><pubDate>Wed, 23 Jan 2013 23:40:02 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47268</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Part 2 of a blog series visually demonstrating the layout of objects on data pages in SQL Server&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx"&gt;Part 1&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In Part 1 of this series, I introduced a little demo app that renders the layout of pages in SQL Server files by object. Today I’ll put that app through its paces to show, in vivid color (well, teal, anyway) the destructive power of the famous&lt;strong&gt; Re-Index Then Shrink&lt;/strong&gt; anti-pattern for index maintenance.&lt;/p&gt;  &lt;p&gt;This one is very easy to demo, so let’s go!&lt;/p&gt;  &lt;p&gt;First, I created a demo database &lt;strong&gt;VizDemo1&lt;/strong&gt;, with a single 200 MB data file. Into that database I placed a canonical table – highly simplified for this example – clustered on an ever-increasing integer, using identity():&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;USE &lt;/span&gt;VizDemo1
&lt;span style="color:blue;"&gt;GO

CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;int identity&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;1&lt;span style="color:gray;"&gt;, &lt;/span&gt;1 &lt;span style="color:gray;"&gt;) NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;);
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;Then we populate that table with some dummy data:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;

&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;40000&lt;/pre&gt;


&lt;p&gt;And finally, fire up the little visualizer app and process the database:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_01_4683F23D.png"&gt;&lt;img title="VizDemo1_01" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_01" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_01_thumb_53EA0543.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The small color bands at the top left corner of the image are the system tables and such that are in every “empty” database to make it run. The blue/green/teal area is the new table we created and populated with sample data, and the gray area represents empty regions in the file.&lt;/p&gt;

&lt;p&gt;As expected, the table started writing into the first available space, and, because the cluster key is increasing, pages were allocated to the end of the page sequence in order, and we end up with a crazy-perfect, contiguous linked list on disk.&lt;/p&gt;

&lt;p&gt;You can see small darker bars at intervals within the table – most of the pages in the index are “type 1” pages, which are the leaf-level/rows in the clustered index. Those bars are “type 2” index pages that have the upper level(s) of the index. The reason they are darker is that those are a disruption in the leaf level linked list, and the app shades such disruptions as a way to see fragmentation. The list has to “hop over” those pages and then continue on the other side. It’s technically fragmentation, but at this point not harmful at all – but remember that darker color that shows a break in the page order.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A side note: in the midst of the gray area you can see one orange line (and another in the sea of teal). Those are “type 11” PFS pages, which happen on a fixed interval in every file. I don’t think they ever move – they track file allocation and free space metadata. They are like rocks in the stream…&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now, what happens if we re-index this bad boy? Well, a re-index operation has to write all the pages for the object into new, blank pages in the file, and then abandon the old pages. I run:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:green;"&gt;-- This &amp;quot;moves&amp;quot; all the data toward the end of the file, into free areas
&lt;/span&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;Then re-analyze the file. As expected, the table has “moved” toward the end of the file, and left free space toward the beginning. It’s still not fragmented, because we had enough room, and it was written in order into that new area by the rebuild:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_02_61501849.png"&gt;&lt;img title="VizDemo1_02" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_02" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_02_thumb_3EFB1381.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see the gray area near the top is all the “abandoned” pages where the index was, and the data has all moved down into the free area. Ah, but that seems wasteful to some people, am I right? All that empty space – the file could be smaller!&lt;/p&gt;

&lt;p&gt;Let’s see the damage that Shrink File does. Imagine that I do this:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;DBCC &lt;/span&gt;SHRINKFILE &lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:red;"&gt;N'VizDemo1' &lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;70&lt;span style="color:gray;"&gt;)
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;First, before we shrink, let’s just scroll down and look at the end of the file:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_03_4E9DAF43.png"&gt;&lt;img title="VizDemo1_03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_03" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_03_thumb_10A41E85.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have two conditions – the gray part of the file is sort of OK to shrink. There’s just a lone PFS page out there, and removing that does no harm. But once we get into that blue area, the data has to be moved back up into the beginning of the file. Here’s where the problem lies, as I learned from Mr. Paul Randal – the shrink routine will move a page at a time back into that free space, starting from the end, going backward. That makes the pages land in approximately &lt;em&gt;reverse order&lt;/em&gt; from the correct index order. Perfect fragmentation. Let’s see if this tool proves him right. Shrink, then re-analyze:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04_595D9749.png"&gt;&lt;img title="VizDemo1_04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_04" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04_thumb_500462C6.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yep, it’s not immediately apparent, perhaps, but that teal color is a darker shade that indicates every page is a fragment boundary in most of the index – perfect fragmentation! Here’s a better view:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04a_51D4B88D.png"&gt;&lt;img title="VizDemo1_04a" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_04a" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04a_thumb_665A080B.png" width="408" height="284" /&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;So, how can we clean that up? Well, with a rebuild. But … we need that bigger file. In fact, practically any database in production needs this overhead of available space to be able to perform index maintenance. It’s not “wasted” space at all.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;

&lt;p&gt;After the rebuild, the index is back toward the end of the file, but it’s also back in order:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_05_13DB27CF.png"&gt;&lt;img title="VizDemo1_05" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_05" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_05_thumb_1C5E8719.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, in light of this information, imagine nightly re-indexing on a database with … &lt;strong&gt;AutoShrink!&lt;/strong&gt; &amp;lt;shudder&amp;gt;&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout I</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx</link><pubDate>Tue, 22 Jan 2013 22:50:34 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47250</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Part 1 of a blog series visually demonstrating the layout of objects on data pages in SQL Server&lt;/p&gt;  &lt;p&gt;Some years ago a gentleman called &lt;a href="http://sqlblogcasts.com/blogs/danny/default.aspx"&gt;Danny Gould&lt;/a&gt; created a free tool called &lt;a href="http://internalsviewer.codeplex.com/"&gt;Internals Viewer for SQL Server&lt;/a&gt;. I’m a visual sort of guy, and I always thought it would be fun and educational to make a simple visualizer, like the one he created, in order to view how objects are laid out in SQL Server files, and to use it to demonstrate how operations like re-index and shrink affect the layout of files.&lt;/p&gt;  &lt;p&gt;To that end, and a little bit reinventing the wheel truth be told, I spent this past holiday creating a simple .NET app that renders the file layout of a database into a color-coded bitmap:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer01_48CAE32A.png"&gt;&lt;img title="FileLayoutViewer01" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FileLayoutViewer01" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer01_thumb_586D7EEC.png" width="824" height="376" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Fig 1&lt;/p&gt;  &lt;p&gt;The app can scan the pages in a database, grab the header output from DBCC PAGE, parse that, and create a structure with a few key bits of information about every page. It then renders a bitmap from those structures showing a few things (Fig 1):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Each data object (index or table) partition is identified with a unique partition ID in SQL Server. Those IDs are used in this tool to color-code the output by object, from a color lookup table. Each color in the example screenshot represents the pages that are dedicated to a single partition of an object. This screenshot shows AdventureWorks, which doesn’t use the Enterprise Edition partitioning feature, so for this case each color represents one object – every object having exactly one partition in Standard Edition (or in databases that don’t use partitioning).&lt;/p&gt;    &lt;p&gt;Unallocated pages are shown as gray gaps. These are regions that are part of the physical file(s), but not used to store anything.&lt;/p&gt;    &lt;p&gt;The app flags pages at the end of any fragment of an object using a darker colored band, so it will reveal any non-contiguous structures in the data file(s). Sometimes these happen at the end of a region of the file where one object is stored, but, interestingly, sometimes these can happen in the middle – as shown in the image above where a dark band interrupts a continuous region of the same color.&lt;/p&gt;    &lt;p&gt;The app has some very basic mouse-over capability where you can run the mouse over the image and the text fields at right will reveal information about the pages, including the object schema.table.index and partition, and also whether the page represents a fragmentation boundary.&lt;/p&gt;    &lt;p&gt;Finally, the app shows what page types are located where in the file using the narrower white/gray/black bands. White represents data or index pages, while other shades of gray or black indicate other kinds of system pages, per Paul Randal’s excellent blog post &lt;a href="http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/"&gt;here&lt;/a&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;h2&gt;The Pixels Already Tell a Story&lt;/h2&gt;  &lt;p&gt;So, what can we learn about the sample database in this image? Here are a few things:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The part of the file shown in the bitmap is fairly dense. There aren’t big regions of unallocated space in the file. A gap in the allocated pages looks like this (enlarged):      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer02_6C869B75.png"&gt;&lt;img title="FileLayoutViewer02" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FileLayoutViewer02" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer02_thumb_1354B1B6.png" width="206" height="194" /&gt;&lt;/a&gt;       &lt;br /&gt;Empty Region       &lt;br /&gt;&amp;#160; &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;Objects in the file are not contiguous, and may “hop around.” That is, if you follow the linked list of pages that compose an index, a bunch of them will be in a row, and then there will be a page that links to the next page composing the index but it’ll be in a different location in the file. I’ve called these “frag boundaries” – pages that do link to another page, but where that next logical page isn’t the next physical page in the file. In the graphic the frag boundary pages are colored with a darker dithered pattern. You can mouse over these and look in the text fields at the right in the app, and see the page they link to.      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer03_495BB6DC.png"&gt;&lt;img title="FileLayoutViewer03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewer03" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer03_thumb_7029CD1C.png" width="238" height="188" /&gt;&lt;/a&gt;       &lt;br /&gt;Fragment Boundaries       &lt;br /&gt;&amp;#160; &lt;br /&gt;      &lt;br /&gt;Sometimes the end of a fragment will be adjacent to pages from another object, but it can be the case that there’s a fragment boundary in the middle of the pages for one object – it’s just that the linked list goes up to that point in the file, but then the next page in the index (in index order) isn’t the next page in the file, even though the next page in the file is part of the same object. Imagine a page split in the “middle” of an index – the existing page with half the rows stays in place, and a new page with the other half of the rows gets created in the middle of the logical index order but possibly stored in some other location in the physical file.       &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;Right at the very beginning of the file there’s a special sequence of metadata pages that describe the database, allocations, and so on (again, well documented by Paul Randal). In our diagram this shows up as a series of pages at top left with varying page type (the gray and white) indicators:      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer04_020660EA.png"&gt;&lt;img title="FileLayoutViewer04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewer04" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer04_thumb_28D4772A.png" width="268" height="206" /&gt;&lt;/a&gt;       &lt;br /&gt;Database and file metadata pages &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;In the next installment, I’ll run some test databases through this and we can see what more severe fragmentation looks like, the effect of GUID cluster keys, shrink, and how the data moves around in a re-index operation.&lt;/p&gt;  &lt;p&gt;Here’s a short demo video of the mouse-over working (quality is You-Tube limited):&lt;/p&gt;  &lt;div id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:a5c43597-8d84-4a1a-b897-db2f19356e72" class="wlWriterEditableSmartContent" style="float:none;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;"&gt;&lt;div&gt;&lt;a href="http://www.youtube.com/watch?v=q47a0L0_9Ws" target="_new"&gt;&lt;img src="http://sqlblog.com/blogs/merrill_aldrich/videof83e47a6713d_7D8FE022.jpg" style="border-style:none;" alt=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="width:448px;clear:both;font-size:.8em;"&gt;Animated Screen Cap of Mouse-over&lt;/div&gt;&lt;/div&gt;</description></item><item><title>Update to Rules-Driven Maintenance</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/16/update-to-rules-driven-maintenance.aspx</link><pubDate>Wed, 16 Jan 2013 18:34:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47183</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Back in August I &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;posted&lt;/a&gt; a first version of a rules-driven solution for backups, index and statistics maintenance and integrity checks. The system in general has been working well, and has saved my team a huge amount of time and effort. We are coming to the anniversary of its use in production soon.&lt;/p&gt;  &lt;p&gt;Today I offer an update that contains a few enhancements, performance improvements and a bug fix.&lt;/p&gt;  &lt;p&gt;To recap, this is a system or framework to manage many small databases across many instances on many servers in a policy-based/automated way. The details about how to deploy and how to use the system are in the &lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2012/08/01/rules-driven-maintenance.aspx"&gt;original post&lt;/a&gt;, but at a high level:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;First deploy the code across a collection of servers using the included PowerShell script &lt;b&gt;DeployMaintenance.ps1&lt;/b&gt;. That will create an administrative database on every instance (or use one that you provide) and populate it with the required tables, procedures, etc. The deployment script will also make a handful of SQL Agent jobs that handle all the maintenance on all the databases.&lt;/li&gt;    &lt;li&gt;Visit the instances. If there is maintenance already in place, disable or remove it so as not to perform this work twice. If necessary, change or override the default preferences for this solution in the admin database for each type of maintenance. This includes the time of maintenance windows, whether to use features like differential backups or Litespeed, whether there are databases that should be treated as exceptions, with different settings, etc.&lt;/li&gt;    &lt;li&gt;Enable the new policy jobs in SQL Agent. &lt;/li&gt;    &lt;li&gt;Monitor to make sure things are running smoothly.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;The theory behind this project is to save work configuring servers (efficiency), to ensure that the maintenance is in fact deployed and working for ALL databases in the environment (quality), to ensure that it works in a truly consistent way everywhere (consistency), and to make it so that the servers don’t have to be touched by a DBA when changes happen, such as adding or dropping databases (resiliency).&lt;/p&gt;  &lt;h4&gt;Changes&lt;/h4&gt;  &lt;p&gt;The updates in this 1.1 version include&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Better reporting of errors. The jobs now write log files to the default error log location for SQL Server.&lt;/li&gt;    &lt;li&gt;Better recovery from errors. The jobs in some cases continue on error – for example, one failed backup will not stop the backup job and leave other databases without backups.&lt;/li&gt;    &lt;li&gt;Performance enhancements for examining backup history and system counters (mainly % log used).&lt;/li&gt;    &lt;li&gt;One bug fix for the log backup logic on a server that uses both log shipping and Litespeed.&lt;/li&gt; &lt;/ul&gt;  &lt;h4&gt;Disclaimers&lt;/h4&gt;  &lt;p&gt;The code for the system is posted here, and you are welcome to download it and try it out. Please note that this is not a commercial solution, and that while you may test or deploy this, you do so at your sole risk. Edits to the code almost certainly will be required for your environment. Please read, understand and test the code thoroughly before you even &lt;i&gt;imagine&lt;/i&gt; using this in production. I do not want to cause you or your employer any pain. License for using this solution is GPL 3.0: &lt;a href="http://opensource.org/licenses/GPL-3.0"&gt;http://opensource.org/licenses/GPL-3.0&lt;/a&gt;. You may use and modify this code, but not sell it.&lt;/p&gt;  &lt;p&gt;This has been run on SQL Server versions from 2005 to 2008 R2. It probably works on 2012, but I have not tested it thoroughly.&lt;/p&gt;  &lt;p&gt;I run only case-insensitive collation; if you are on a case-sensitive system, then you have my sympathy, and you might have to edit a few things to make this work.&lt;/p&gt;  &lt;p&gt;There is coverage for Quest Litespeed backups in the solution; other third-party backup products could be incorporated easily, but have not been. It should be possible to follow the pattern used to handle Litespeed and extend it to any other third-party stored procedures.&lt;/p&gt;  &lt;p&gt;If you haven’t done index maintenance for a while, and you flip this solution on &lt;b&gt;you might blow up your transaction log&lt;/b&gt;. Be sure that you have plenty of log space and plenty of log backup space, and watch the process the first few runs. This is especially true if the combination of &lt;b&gt;Standard Edition of SQL Server&lt;/b&gt; and the preference for &lt;b&gt;Online&lt;/b&gt; index maintenance applies in your environment, because the maintenance process will perform index maintenance using reorganize (defrag), which produces a lot of log.&lt;/p&gt;  &lt;p&gt;Lastly, this is &lt;b&gt;a solution for the type of environment that has many small databases&lt;/b&gt;, not the type with a few big, performance intensive databases. Databases that are large and require detailed attention may not be suitable for a one-size-fits-all solution like this.&lt;/p&gt;</description></item><item><title>Geek City: Build a Big Table with a Columnstore Index</title><link>http://sqlblog.com/blogs/kalen_delaney/archive/2012/04/26/Build-a-Big-Table-with-a-Columnstore-Index.aspx</link><pubDate>Thu, 26 Apr 2012 18:18:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:43013</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;p&gt;I was looking all over to find a big table to use for my columnstore 
examples in my SQL Server 2012 book, and the only one I found was from someone 
at Microsoft and was not publicly available.  When I demonstrate code in my 
writings, I like to have the code available for anyone to use, so that was no 
use. &lt;/p&gt;
&lt;p&gt;Finally I realized I was just going to have to do it myself. I actually based 
the script on some work by the awesome Benjamin Nevarez (&lt;a href="http://www.benjaminnevarez.com/"&gt;blog&lt;/a&gt;| &lt;a href="http://twitter.com/BenjaminNevarez"&gt;twitter&lt;/a&gt;), but I needed to make a 
couple of changes to the table structure, and then the loop for adding the data 
took some tweeking to make sure that uniqueness was possible, in case you need 
to test out how the metadata changes when a columnstore index is built on a 
table with a unique clustered index compared to a nonunique clustered index 
compared to a heap. &lt;/p&gt;
&lt;p&gt;I have just finished the chapter on indexes, and decided to make this script 
available. The initial table data is based on the data in Microsoft 
AdventureWorksDW2012 sample database that you can download &lt;a href="http://msftdbprodsamples.codeplex.com/releases/view/55330"&gt;here&lt;/a&gt;.  
(Note that the DW version is the first one under “Other Available Downloads”’; 
it’s not the one under “Recommended Download”. )&lt;/p&gt;
&lt;p&gt;Here is the section of the script that populates most of the 
&lt;em&gt;FactInternetSalesBIG&lt;/em&gt; table. (The original 60K rows were just copied 
from the original &lt;em&gt;FactInternetSales&lt;/em&gt; table. Note that all the statements 
have to be run in a single batch because the local variable’s scope is the 
batch, and because the GO 9 applies to the single batch that precedes it. Of 
course, you can change the number of iterations to end up with a different size 
table. Mine ends up at just over 30 million rows.  In addition to modifying the 
&lt;em&gt;SalesOrderNumber&lt;/em&gt; value on each iteration, I also changed the value of the 
Revision column to indicate which pass through the insert loop was being 
executed. &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;br&gt;&lt;font size="2" face="Courier New"&gt;&lt;br&gt;-- Copy the new big table into itself 
9 times&lt;br&gt;DECLARE @RevisionNumber nchar(2);&lt;br&gt;SELECT @RevisionNumber = 
RevisionNumber + 1 FROM RevisionNumberValue;&lt;br&gt;SELECT @RevisionNumber as 
RevisionNumber;&lt;br&gt;INSERT INTO dbo.FactInternetSalesBig WITH (TABLOCK)&lt;br&gt;    
SELECT ProductKey&lt;br&gt;      ,OrderDateKey&lt;br&gt;      ,DueDateKey&lt;br&gt;      
,ShipDateKey&lt;br&gt;      ,CustomerKey&lt;br&gt;      ,PromotionKey&lt;br&gt;      
,CurrencyKey&lt;br&gt;      ,SalesTerritoryKey&lt;br&gt;      ,SalesOrderNumber + 
@RevisionNumber&lt;br&gt;      ,SalesOrderLineNumber&lt;br&gt;      
,@RevisionNumber&lt;br&gt;      ,OrderQuantity&lt;br&gt;      ,UnitPrice&lt;br&gt;      
,ExtendedAmount&lt;br&gt;      ,UnitPriceDiscountPct&lt;br&gt;      ,DiscountAmount&lt;br&gt;      
,ProductStandardCost&lt;br&gt;      ,TotalProductCost&lt;br&gt;      ,SalesAmount&lt;br&gt;      
,TaxAmt&lt;br&gt;      ,Freight&lt;br&gt;      ,CarrierTrackingNumber&lt;br&gt;      
,CustomerPONumber &lt;br&gt;      ,OrderDate&lt;br&gt;      ,DueDate&lt;br&gt;      ,ShipDate&lt;br&gt;  
FROM dbo.FactInternetSalesBig;&lt;br&gt;  UPDATE RevisionNumberValue SET 
RevisionNumber = RevisionNumber + 1;&lt;br&gt;GO 9&lt;/font&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Make sure you have enough log space. My log grew from 1GB to just over 8GB 
when running the script to build the 30 million row table and columnstore index, 
with no clustered index.  &lt;/p&gt;
&lt;p&gt;The downloadable script has commented options to build either a clustered or  
nonclustered index before you build the columnstore index. I suggest building 
the clustered index before the columnstore index,  because building the 
clustered index&amp;nbsp;will rebuild any existing nonclustered indexes, including my 
columnstore index. Building the columnstore index can take a while, on my system 
it was about 10 minutes. (Building the original 30+ million row table took even longer.)&lt;/p&gt;&lt;p&gt;After I finish the chapter on partitioning, I'll have a script to partition this table, but feel free to do that yourself!&lt;/p&gt;
&lt;p&gt;Have fun!&lt;/p&gt;
&lt;p&gt;&lt;font color="#d16349" size="4"&gt;~Kalen&lt;/font&gt;&lt;/p&gt;</description></item></channel></rss>