<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'internals', 'indexes', and 'maintenance'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=internals,indexes,maintenance&amp;orTags=0</link><description>Search results matching tags 'internals', 'indexes', and 'maintenance'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Visualizing Data File Layout III</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/29/visualizing-data-file-layout-iii.aspx</link><pubDate>Tue, 29 Jan 2013 05:45:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47372</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;This is part three of a blog series illustrating a method to render the file structure of a SQL Server database into a graphic visualization.&lt;/p&gt;  &lt;p&gt;Previous Installments:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx"&gt;Part 1&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/23/visualizing-data-file-layout-ii.aspx"&gt;Part 2&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Those that have been reading this series might be be thinking, “Is he going to go there?” Well, the answer is “Yes.” This is the &lt;strong&gt;GUID clustered index post&lt;/strong&gt; that had to be. It’s inevitable with this tool.&lt;/p&gt;  &lt;p&gt;If you follow SQL Server at all, you are probably aware of the &lt;a href="http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/"&gt;long-standing&lt;/a&gt; &lt;a href="http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html"&gt;debate&lt;/a&gt; about whether it is wise, desirable, smart, useful, or what have you, to identify rows using GUIDs. I won’t take a position on that, but I will show here, I hope objectively, a few things that the visualizer shows about file layout vs. distributed inserts, distributed inserts being one of the main challenges around using GUIDs as clustering keys. Just to recap the argument very, very briefly:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;GUID keys can be generated at the client, which saves a round-trip to the database server to create a collection of related rows.&lt;/p&gt;  &lt;p&gt;GUID keys can make certain architectures like sharding, or peer to peer replication, or merging multiple source databases, simpler.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;GUID keys are wider, therefore they take more space in memory and on disk. The additional space is multiplied by their presence in both clustered and non-clustered indexes if they are a clustering key.&lt;/p&gt;  &lt;p&gt;GUID keys don’t only take more space in RAM and on disk because of their width. They also cause &lt;em&gt;distributed inserts&lt;/em&gt; into the clustered index – that is, new rows are added to any and all pages in the index. Each time a row has to be added, the target page must be read into memory, and at a checkpoint, the &lt;em&gt;whole&lt;/em&gt; changed page (both existing and new rows) must be written to disk. This has two effects: &lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The amount of RAM and disk IO required for inserts is probably much higher, as pages with &lt;em&gt;existing&lt;/em&gt; data must come into cache, get changed, and then be written back out again. Essentially, large parts of the table have to be &lt;em&gt;rewritten&lt;/em&gt; to disk to append rows to pages that have data already.&lt;/li&gt;    &lt;li&gt;The pages that store the index will individually fill up, and have to split such that half the existing rows are written back out to the “old” page and half written out to a “new” page in a different location on disk. This causes the pages to be less full, the same number of rows to require more space on disk and in RAM, and the resulting index to be massively fragmented on disk.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;I am not writing to argue these points, which have I think been established by both sides of the debate, only to see if the visualizer shows these effects clearly. Most of the argument isn’t actually about these facts (they are all true, as far as I know) but rather which are more important, and I think that is the main source of debate on the issue.&lt;/p&gt;  &lt;h2&gt;Visual Example of Distributed Inserts&lt;/h2&gt;  &lt;p&gt;It’s very easy to create an example of this with a small sample database. I created one called “VizDemo2.” VizDemo2 has a slightly modified structure to illustrate what’s going on here – I need two tables that are stored separately on disk, so that they cannot interfere with one another. The simplest way to do that is with a couple of file groups containing one file each. So here’s the structure:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;I created the database with a 50MB, single file, Primary file group&lt;/li&gt;    &lt;li&gt;I added a file group FG1 with one 75MB file&lt;/li&gt;    &lt;li&gt;I added a second file group FG2 with one 75MB file&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;When the database is empty, the visualizer shows only the system pages at the start of each file, as shown here:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_01_5BB678D6.png"&gt;&lt;img title="VizDemo2_01" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_01" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_01_thumb_18DA345C.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;To that database I added two sample tables identical in structure but with different clustering keys:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;USE &lt;/span&gt;VizDemo2
&lt;span style="color:blue;"&gt;GO

CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersInt  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;int identity&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;1&lt;span style="color:gray;"&gt;, &lt;/span&gt;1 &lt;span style="color:gray;"&gt;) NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;[FG1]&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;span style="color:blue;"&gt;
CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;uniqueidentifier &lt;/span&gt;&lt;span style="color:gray;"&gt;NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED DEFAULT &lt;/span&gt;&lt;span style="color:magenta;"&gt;NEWID&lt;/span&gt;&lt;span style="color:gray;"&gt;(), 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;)&lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;[FG2]&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;I’ll populate the two tables and we can see what the file layout looks like afterward:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersInt &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;20000&lt;/pre&gt;


&lt;h2&gt;Compare&lt;/h2&gt;

&lt;p&gt;After inserts, the resulting graphic does show some facts we know to be true:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_02_6193AD20.png"&gt;&lt;img title="VizDemo2_02" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_02" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_02_thumb_7A25DB21.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, the data in the integer-clustered index takes about eight bands of the diagram, while storing the same data in a GUID clustered index has required about twelve bands of data pages. The database itself supports that impression with space allocation – it reports these figures:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_03_50B199E1.png"&gt;&lt;img title="VizDemo2_03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_03" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_03_thumb_49262A74.png" width="737" height="75" /&gt;&lt;/a&gt;&lt;/p&gt;





&lt;p&gt;Part of the extra space required is the width of the key, but part of it is the empty space on each page resulting from page splits. If a page that needs a new row is too full, then half the rows from that page are moved to a net-new page, half left in place, and the new row added to one or the other of the resulting pages. Afterward, they are often both partly empty.&lt;/p&gt;

&lt;p&gt;Second, the whole graphic in the GUID clustered index area is a dark blue that the visualizer uses to show fragmentation – in fact, the object is almost perfectly fragmented, with practically no contiguous pages at all. The sequence of pages in the leaf level of the index is still a linked list, as always, but it it is physically stored in essentially random order on disk.&lt;/p&gt;

&lt;h2&gt;Does Re-Indexing Help?&lt;/h2&gt;

&lt;p&gt;The next question is whether we can combat these problems by doing huge amounts of index maintenance – if we rewrite the GUID index, will that make it take less space, or make it more efficient? The answer is, “well, sort of, temporarily.”&lt;/p&gt;

&lt;p&gt;First, re-indexing will put the table in “GUID” order. Whether that really helps or not is debatable, perhaps. It would enable read-ahead for the index, which is otherwise clobbered by the fragmentation. Having the table in “GUID” order might or might not be of any help to performance. Second, re-indexing will make the pages denser, or less dense, depending on the fill factor applied. For the sake of demonstration, let’s re-index with the default fill factor, because I think that happens a lot out in the world, and it may tell us something:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;After re-indexing, this is a view just of the second file group with the GUID clustered table (note that I scrolled down in the display):&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_04_3266E2EB.png"&gt;&lt;img title="VizDemo2_04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_04" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_04_thumb_1B3B686D.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The arrow shows where the data was moved from the old data pages into a new region of the file. And, sure enough, it’s not fragmented (note the lighter color) and it takes less space in the file.&lt;/p&gt;

&lt;p&gt;That might sound good, but if this is a real database, inserts probably will continue. In the int clustered case, as we know, new data will be appended to the end of the page sequence, but in this case, new data will have to be inserted into most of the existing pages on disk. Those are all full now, and will have to be split 50/50 to create new pages for the new data, both the old and new pages will have to be written out, and the new pages by definition can’t be in index order with the existing pages.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomersGuid &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;20000&lt;/pre&gt;


&lt;p&gt;What we get after more rows are added to the table is what a layperson might call a “hot mess:”&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_05_1D0BBE34.png"&gt;&lt;img title="VizDemo2_05" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="VizDemo2_05" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo2_05_thumb_7A4A8676.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here everything is fragmented – back to that dark blue – even the pages we just re-indexed a moment ago, because they all split. The table has &lt;em&gt;more than&lt;/em&gt; doubled in size, even though we just doubled the number of rows, because the individual pages contain less data.&lt;/p&gt;

&lt;p&gt;Would appropriate fill factor be a workaround? In some measure, yes, but it really only combats the issue. The write activity on the table, even with a low fill factor, will still be higher as more existing pages have to be flushed at checkpoints. The pages will still be less dense, and therefore take up more space on disk and in cache. In short – maybe helpful but no silver bullet.&lt;/p&gt;

&lt;p&gt;What about Sequential GUIDs? Here I will venture my opinion. Sequential GUIDs have never made sense to me. They solve one part of this problem – the distributed insert part – but &lt;em&gt;at the expense of the very things GUIDs might be good for&lt;/em&gt;, namely not demanding a visit to the database to generate an identifier. If you have to come to the database, you already lost this whole argument. Use an integer and solve the rest of the problem at the same time. I can only see it as a sort of band-aid for existing systems that could not be refactored, but, like a bad SUV that combines the worst properties of a car and a truck, it feels like a really poor compromise to me.&lt;/p&gt;

&lt;p&gt;I hope this helps to illustrate some of the physical database design challenges that surround the use of GUID cluster keys. In the next installment I’m planning to demonstrate the interleaving of objects, which is one argument for multiple file groups.&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout II</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/23/visualizing-data-file-layout-ii.aspx</link><pubDate>Wed, 23 Jan 2013 23:40:02 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47268</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Part 2 of a blog series visually demonstrating the layout of objects on data pages in SQL Server&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx"&gt;Part 1&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;In Part 1 of this series, I introduced a little demo app that renders the layout of pages in SQL Server files by object. Today I’ll put that app through its paces to show, in vivid color (well, teal, anyway) the destructive power of the famous&lt;strong&gt; Re-Index Then Shrink&lt;/strong&gt; anti-pattern for index maintenance.&lt;/p&gt;  &lt;p&gt;This one is very easy to demo, so let’s go!&lt;/p&gt;  &lt;p&gt;First, I created a demo database &lt;strong&gt;VizDemo1&lt;/strong&gt;, with a single 200 MB data file. Into that database I placed a canonical table – highly simplified for this example – clustered on an ever-increasing integer, using identity():&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;USE &lt;/span&gt;VizDemo1
&lt;span style="color:blue;"&gt;GO

CREATE TABLE &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers  &lt;span style="color:gray;"&gt;( 
    &lt;/span&gt;id &lt;span style="color:blue;"&gt;int identity&lt;/span&gt;&lt;span style="color:gray;"&gt;( &lt;/span&gt;1&lt;span style="color:gray;"&gt;, &lt;/span&gt;1 &lt;span style="color:gray;"&gt;) NOT NULL &lt;/span&gt;&lt;span style="color:blue;"&gt;PRIMARY KEY CLUSTERED&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;buncha &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'A'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;big &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'B'&lt;/span&gt;&lt;span style="color:gray;"&gt;, 
    &lt;/span&gt;vals &lt;span style="color:blue;"&gt;char&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;500&lt;span style="color:gray;"&gt;)  &lt;/span&gt;&lt;span style="color:blue;"&gt;default &lt;/span&gt;&lt;span style="color:red;"&gt;'C'
&lt;/span&gt;&lt;span style="color:gray;"&gt;);
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;Then we populate that table with some dummy data:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;INSERT &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;DEFAULT VALUES&lt;/span&gt;&lt;span style="color:gray;"&gt;;

&lt;/span&gt;&lt;span style="color:blue;"&gt;GO &lt;/span&gt;40000&lt;/pre&gt;


&lt;p&gt;And finally, fire up the little visualizer app and process the database:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_01_4683F23D.png"&gt;&lt;img title="VizDemo1_01" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_01" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_01_thumb_53EA0543.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The small color bands at the top left corner of the image are the system tables and such that are in every “empty” database to make it run. The blue/green/teal area is the new table we created and populated with sample data, and the gray area represents empty regions in the file.&lt;/p&gt;

&lt;p&gt;As expected, the table started writing into the first available space, and, because the cluster key is increasing, pages were allocated to the end of the page sequence in order, and we end up with a crazy-perfect, contiguous linked list on disk.&lt;/p&gt;

&lt;p&gt;You can see small darker bars at intervals within the table – most of the pages in the index are “type 1” pages, which are the leaf-level/rows in the clustered index. Those bars are “type 2” index pages that have the upper level(s) of the index. The reason they are darker is that those are a disruption in the leaf level linked list, and the app shades such disruptions as a way to see fragmentation. The list has to “hop over” those pages and then continue on the other side. It’s technically fragmentation, but at this point not harmful at all – but remember that darker color that shows a break in the page order.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;A side note: in the midst of the gray area you can see one orange line (and another in the sea of teal). Those are “type 11” PFS pages, which happen on a fixed interval in every file. I don’t think they ever move – they track file allocation and free space metadata. They are like rocks in the stream…&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now, what happens if we re-index this bad boy? Well, a re-index operation has to write all the pages for the object into new, blank pages in the file, and then abandon the old pages. I run:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:green;"&gt;-- This &amp;quot;moves&amp;quot; all the data toward the end of the file, into free areas
&lt;/span&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;Then re-analyze the file. As expected, the table has “moved” toward the end of the file, and left free space toward the beginning. It’s still not fragmented, because we had enough room, and it was written in order into that new area by the rebuild:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_02_61501849.png"&gt;&lt;img title="VizDemo1_02" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_02" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_02_thumb_3EFB1381.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see the gray area near the top is all the “abandoned” pages where the index was, and the data has all moved down into the free area. Ah, but that seems wasteful to some people, am I right? All that empty space – the file could be smaller!&lt;/p&gt;

&lt;p&gt;Let’s see the damage that Shrink File does. Imagine that I do this:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;DBCC &lt;/span&gt;SHRINKFILE &lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:red;"&gt;N'VizDemo1' &lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;70&lt;span style="color:gray;"&gt;)
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO
&lt;/span&gt;&lt;/pre&gt;


&lt;p&gt;First, before we shrink, let’s just scroll down and look at the end of the file:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_03_4E9DAF43.png"&gt;&lt;img title="VizDemo1_03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_03" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_03_thumb_10A41E85.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have two conditions – the gray part of the file is sort of OK to shrink. There’s just a lone PFS page out there, and removing that does no harm. But once we get into that blue area, the data has to be moved back up into the beginning of the file. Here’s where the problem lies, as I learned from Mr. Paul Randal – the shrink routine will move a page at a time back into that free space, starting from the end, going backward. That makes the pages land in approximately &lt;em&gt;reverse order&lt;/em&gt; from the correct index order. Perfect fragmentation. Let’s see if this tool proves him right. Shrink, then re-analyze:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04_595D9749.png"&gt;&lt;img title="VizDemo1_04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_04" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04_thumb_500462C6.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yep, it’s not immediately apparent, perhaps, but that teal color is a darker shade that indicates every page is a fragment boundary in most of the index – perfect fragmentation! Here’s a better view:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04a_51D4B88D.png"&gt;&lt;img title="VizDemo1_04a" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_04a" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_04a_thumb_665A080B.png" width="408" height="284" /&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;So, how can we clean that up? Well, with a rebuild. But … we need that bigger file. In fact, practically any database in production needs this overhead of available space to be able to perform index maintenance. It’s not “wasted” space at all.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;ALTER INDEX &lt;/span&gt;&lt;span style="color:gray;"&gt;ALL &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;dbo&lt;span style="color:gray;"&gt;.&lt;/span&gt;SampleCustomers &lt;span style="color:blue;"&gt;REBUILD&lt;/span&gt;&lt;span style="color:gray;"&gt;;
&lt;/span&gt;&lt;span style="color:blue;"&gt;GO&lt;/span&gt;&lt;/pre&gt;

&lt;p&gt;After the rebuild, the index is back toward the end of the file, but it’s also back in order:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_05_13DB27CF.png"&gt;&lt;img title="VizDemo1_05" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;float:none;padding-top:0px;padding-left:0px;margin-left:auto;border-left:0px;display:block;padding-right:0px;margin-right:auto;" border="0" alt="VizDemo1_05" src="http://sqlblog.com/blogs/merrill_aldrich/VizDemo1_05_thumb_1C5E8719.png" width="1028" height="494" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, in light of this information, imagine nightly re-indexing on a database with … &lt;strong&gt;AutoShrink!&lt;/strong&gt; &amp;lt;shudder&amp;gt;&lt;/p&gt;</description></item><item><title>Visualizing Data File Layout I</title><link>http://sqlblog.com/blogs/merrill_aldrich/archive/2013/01/22/visualizing-data-file-layout-i.aspx</link><pubDate>Tue, 22 Jan 2013 22:50:34 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47250</guid><dc:creator>merrillaldrich</dc:creator><description>&lt;p&gt;Part 1 of a blog series visually demonstrating the layout of objects on data pages in SQL Server&lt;/p&gt;  &lt;p&gt;Some years ago a gentleman called &lt;a href="http://sqlblogcasts.com/blogs/danny/default.aspx"&gt;Danny Gould&lt;/a&gt; created a free tool called &lt;a href="http://internalsviewer.codeplex.com/"&gt;Internals Viewer for SQL Server&lt;/a&gt;. I’m a visual sort of guy, and I always thought it would be fun and educational to make a simple visualizer, like the one he created, in order to view how objects are laid out in SQL Server files, and to use it to demonstrate how operations like re-index and shrink affect the layout of files.&lt;/p&gt;  &lt;p&gt;To that end, and a little bit reinventing the wheel truth be told, I spent this past holiday creating a simple .NET app that renders the file layout of a database into a color-coded bitmap:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer01_48CAE32A.png"&gt;&lt;img title="FileLayoutViewer01" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FileLayoutViewer01" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer01_thumb_586D7EEC.png" width="824" height="376" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Fig 1&lt;/p&gt;  &lt;p&gt;The app can scan the pages in a database, grab the header output from DBCC PAGE, parse that, and create a structure with a few key bits of information about every page. It then renders a bitmap from those structures showing a few things (Fig 1):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Each data object (index or table) partition is identified with a unique partition ID in SQL Server. Those IDs are used in this tool to color-code the output by object, from a color lookup table. Each color in the example screenshot represents the pages that are dedicated to a single partition of an object. This screenshot shows AdventureWorks, which doesn’t use the Enterprise Edition partitioning feature, so for this case each color represents one object – every object having exactly one partition in Standard Edition (or in databases that don’t use partitioning).&lt;/p&gt;    &lt;p&gt;Unallocated pages are shown as gray gaps. These are regions that are part of the physical file(s), but not used to store anything.&lt;/p&gt;    &lt;p&gt;The app flags pages at the end of any fragment of an object using a darker colored band, so it will reveal any non-contiguous structures in the data file(s). Sometimes these happen at the end of a region of the file where one object is stored, but, interestingly, sometimes these can happen in the middle – as shown in the image above where a dark band interrupts a continuous region of the same color.&lt;/p&gt;    &lt;p&gt;The app has some very basic mouse-over capability where you can run the mouse over the image and the text fields at right will reveal information about the pages, including the object schema.table.index and partition, and also whether the page represents a fragmentation boundary.&lt;/p&gt;    &lt;p&gt;Finally, the app shows what page types are located where in the file using the narrower white/gray/black bands. White represents data or index pages, while other shades of gray or black indicate other kinds of system pages, per Paul Randal’s excellent blog post &lt;a href="http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/"&gt;here&lt;/a&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;h2&gt;The Pixels Already Tell a Story&lt;/h2&gt;  &lt;p&gt;So, what can we learn about the sample database in this image? Here are a few things:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The part of the file shown in the bitmap is fairly dense. There aren’t big regions of unallocated space in the file. A gap in the allocated pages looks like this (enlarged):      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer02_6C869B75.png"&gt;&lt;img title="FileLayoutViewer02" style="border-left-width:0px;border-right-width:0px;background-image:none;border-bottom-width:0px;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border-top-width:0px;" border="0" alt="FileLayoutViewer02" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer02_thumb_1354B1B6.png" width="206" height="194" /&gt;&lt;/a&gt;       &lt;br /&gt;Empty Region       &lt;br /&gt;&amp;#160; &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;Objects in the file are not contiguous, and may “hop around.” That is, if you follow the linked list of pages that compose an index, a bunch of them will be in a row, and then there will be a page that links to the next page composing the index but it’ll be in a different location in the file. I’ve called these “frag boundaries” – pages that do link to another page, but where that next logical page isn’t the next physical page in the file. In the graphic the frag boundary pages are colored with a darker dithered pattern. You can mouse over these and look in the text fields at the right in the app, and see the page they link to.      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer03_495BB6DC.png"&gt;&lt;img title="FileLayoutViewer03" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewer03" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer03_thumb_7029CD1C.png" width="238" height="188" /&gt;&lt;/a&gt;       &lt;br /&gt;Fragment Boundaries       &lt;br /&gt;&amp;#160; &lt;br /&gt;      &lt;br /&gt;Sometimes the end of a fragment will be adjacent to pages from another object, but it can be the case that there’s a fragment boundary in the middle of the pages for one object – it’s just that the linked list goes up to that point in the file, but then the next page in the index (in index order) isn’t the next page in the file, even though the next page in the file is part of the same object. Imagine a page split in the “middle” of an index – the existing page with half the rows stays in place, and a new page with the other half of the rows gets created in the middle of the logical index order but possibly stored in some other location in the physical file.       &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;Right at the very beginning of the file there’s a special sequence of metadata pages that describe the database, allocations, and so on (again, well documented by Paul Randal). In our diagram this shows up as a series of pages at top left with varying page type (the gray and white) indicators:      &lt;br /&gt;&lt;a href="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer04_020660EA.png"&gt;&lt;img title="FileLayoutViewer04" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="FileLayoutViewer04" src="http://sqlblog.com/blogs/merrill_aldrich/FileLayoutViewer04_thumb_28D4772A.png" width="268" height="206" /&gt;&lt;/a&gt;       &lt;br /&gt;Database and file metadata pages &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;In the next installment, I’ll run some test databases through this and we can see what more severe fragmentation looks like, the effect of GUID cluster keys, shrink, and how the data moves around in a re-index operation.&lt;/p&gt;  &lt;p&gt;Here’s a short demo video of the mouse-over working (quality is You-Tube limited):&lt;/p&gt;  &lt;div id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:a5c43597-8d84-4a1a-b897-db2f19356e72" class="wlWriterEditableSmartContent" style="float:none;padding-bottom:0px;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;"&gt;&lt;div&gt;&lt;a href="http://www.youtube.com/watch?v=q47a0L0_9Ws" target="_new"&gt;&lt;img src="http://sqlblog.com/blogs/merrill_aldrich/videof83e47a6713d_7D8FE022.jpg" style="border-style:none;" alt=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="width:448px;clear:both;font-size:.8em;"&gt;Animated Screen Cap of Mouse-over&lt;/div&gt;&lt;/div&gt;</description></item></channel></rss>