THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Fragmentation and Forwarded Records in a Heap

I know it's been a while since I had a technical post, and I've got 3 of them started and saved as drafts. I keep waiting for spare time. Now that PASS is over, and I've crawled partway out from under the TO-DO pile that accumulated while I was preparing for PASS, I'm going to pull one of the drafts and and get to work on it.

After I had posted this, I found that I had written a blog post on almost the exact same topic about a year and half ago! It uses a different example, and the last detail showing a way to remove forwarded records in SQL Server 2008 was not mentioned back then. So since I just spent many hours working on this, I decided to post it anyway.

I received an email a few weeks ago regarding using ALTER TABLE to defrag a heap, and my immediate thought was that it did no such thing, because by definition, a heap cannot be fragmented. SQL Server defrag operation is only concerned with logical fragmentation, which occurs when the logical and physical order do not match. And as you should know, heaps have no logical order. Heaps are just a bunch of rows.

In an index, there is an order, based on the index key(s). So suppose you have an index on LastName. SQL Server guarantees the leaf level will have all your LastName values in logical order, but does not guarantee a physical order. Logical order means that SQL Server can easily access the data in order, by following pointers. The page with "Abbott" and "Anderson" will point to the page with "Barkley" and "Bollinger", which will point to the page with "Charleston" and "Chung". So we can access the data in A-B-C order.  When you first create the index, the physical order will be as close as possible to logical order, so the A's might be on page 112 and the B's on 113 and the C's on 114. However, as you add, update or remove data, the physical order may not be maintained. Pages may have to be split, so if the B page gets full, a new one may be allocated at page 358, where half the B's will move to. Then the first B page, 113, will point to the second B page at 358, and 358 will point to the C page, 114. Page splits due to full pages are the main cause of logical fragmentation.

When you have a heap, there is no logical ordering and there is no splitting of pages.

But I thought a bit more, and realized that there are some other issues that one might consider to be 'fragmentation' in a heap. One issue might be whether the extents belonging to the table are contiguous or not. In SQL 2000, you could use DBCC SHOWCONTIG to look at a value called "Extent Scan Fragmentation" to see how contiguous the pages were, and in SQL 2005 and 2008, when you look at sys.dm_db_index_physical_stats, a high value number for fragment_count or a low value for avg_fragment_size_in_pages can indicate lack of contiguousness. (And I know, the BOL page actually talks about fragmentation for heap, but since the definition seems to suggest an ordering, mentioning a next and previous page, I tent to ignore the issue.) But whether contiguousness is a good thing or a bad thing is not what I'm going to discuss here.

Also, whether fragmentation itself is something you have to worry about is not something I'm going to discuss here. What I really want to tell you about is something that can occur in heaps that can be MUCH MUCH worse than fragmentation, giving worse performance for a table scan than the most fragmented table you can imagine. I'm talking about forwarding pointers. A forwarding pointer appears when you update one or more variable length columns in a row in a heap, and increase the size of the row so it no longer fits on the original page. SQL Server will not split the page, as splitting never happens in heaps. Instead, the new enlarged row is moved to another page, and a small forwarding pointer is left behind.  The forwarding pointer basically is just the file, page and row number address of the new location. The enlarged row at the new location is called a forwarded record, and has information in the row header that indicates the row has been forwarded, and it also includes a back pointer to the original page.

Let’s look at an example to see these forwarding pointers. I'll create a table with two variable length columns.  After I populate the table with five rows, which will fill the page, I’ll update one of the rows to make its third column much longer. The row will no longer fit on the original page and will have to move. I can use DBCC IND to get the page numbers used by the table and DBCC PAGE to look at the full page.

USE tempdb;
GO
CREATE TABLE bigrows
(   a int IDENTITY ,
    b varchar(1600),
    c varchar(1600));
GO
INSERT INTO bigrows 
    VALUES (REPLICATE('a', 1600), ''); 
INSERT INTO bigrows 
    VALUES (REPLICATE('b', 1600), '');
INSERT INTO bigrows 
    VALUES (REPLICATE('c', 1600), '');
INSERT INTO bigrows 
    VALUES (REPLICATE('d', 1600), '');
INSERT INTO bigrows 
    VALUES (REPLICATE('e', 1600), '');
GO

Now let's look at the header for this page. DBCC IND will give us the page number:

DBCC IND (tempdb, bigrows, -1);

-- Note the FileID and PageID from the rows where PageType = 1
-- and use those values with DBCC PAGE (I got FileID 1 and PageID 164)

DBCC TRACEON(3604);
GO
DBCC PAGE(tempdb, 1, 164, 1);
GO

I'm not going to step through all the output to DBCC PAGE, but note the value in the header m_freeCnt = 11. This means there are only 11 free bytes on the entire page. Pretty full, right?

Now let's update one of the rows. I'll change column c to be 1600 bytes instead of 0, and since there are only 11 free bytes on the page, the row will be WAY too big.

UPDATE bigrows 
SET c = REPLICATE('x', 1600)
WHERE a = 3;
GO

If you look at DBCC IND again, you'll now see there is a second data page for the table. I'll let you explore the contents of the new page on your own, but I will show you what is on the original page where the forwarded row used to be:

Slot 2, Offset 0x1feb, Length 9, DumpStyle BYTE
Record Type = FORWARDING_STUB        Record Attributes =                  
Memory Dump @0x61ADDFEB
00000000:   04ea1100 00010000 00†††††††††††††††††.........

Note that there are only 9 bytes total, and the record type indicates the row has moved, because the type is FORWARDING_STUB, which means it has a forwarding pointer.

I told you that forwarding pointers can be a really bad thing. Let's see why that is. I'm going to build a much bigger table now, that is a copy of a big table in AdventureWorks. I'll then ALTER the table to add a new column and fill it with 100 bytes in every single row of the table.

USE AdventureWorks
GO

IF EXISTS (SELECT 1 FROM sys.tables
            WHERE name = 'Details')
        DROP TABLE Details;
GO
--Look at the fragmentation data for the SalesOrderDetail2 table.
-- Note the avg_fragmentation_in_percent value

CREATE TABLE dbo.Details
    ([SalesOrderID] [int] NOT NULL,
    [SalesOrderDetailID] [int] NOT NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [OrderQty] [smallint] NOT NULL,
    [ProductID] [int] NOT NULL,
    [SpecialOfferID] [int] NOT NULL,
    [UnitPrice] [money] NOT NULL,
    [UnitPriceDiscount] [money] NOT NULL);
GO   
INSERT INTO dbo.Details
SELECT [SalesOrderID]
      ,[SalesOrderDetailID]
      ,[CarrierTrackingNumber]
      ,[OrderQty]
      ,[ProductID]
      ,[SpecialOfferID]
      ,[UnitPrice]
      ,[UnitPriceDiscount]
  FROM [AdventureWorks].[Sales].[SalesOrderDetail];
GO

-- sys.dm_db_index_physical_stats shows us the fullness of the pages in avg_page_space_used_in_percent,
-- and it also returns a value showing the number of forwarded records.

-- Note that pages are very full, but there are no forwarded records yet.
-- 856 pages

SELECT object_name(object_id) as Object, page_count,                                          avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
  object_id ('dbo.Details'),  null, null,  'DETAILED');
GO

SET STATISTICS IO ON
GO
-- A table scan takes as many reads as there are pages, i.e. 856 in this case
SELECT * FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO


-- Now add a new fixed width column and note that this is a
-- metadata only change
-- The data pages are not modified
-- There is no change in the fullness of the pages

ALTER TABLE dbo.Details
ADD notes CHAR(100);
GO

SELECT object_name(object_id) as Object, page_count,                                          avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
  object_id ('dbo.Details'),  null, null,  'DETAILED');
GO

-- The data pages are not affected until we run the following update.
-- Every row on every page will get an additional 100 bytes in the notes field
--  added to it
UPDATE dbo.Details
SET notes = 'notes';
GO


-- note there are LOTS of forwarded records now (84408),

-- and many more pages the table (2709)
SELECT object_name(object_id) as Object, page_count,                                          avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
  object_id ('dbo.Details'),  null, null,  'DETAILED');
GO

SET STATISTICS IO ON
GO
-- The number of reads is not just the number of pages as we would expect for
-- a scan of a heap, but is equal to the
-- number of pages PLUS the number of forwarded records:
--  84408 + 2709 = 87117
-- During a scan, the forwarded pointers are followed for EACH row, and then
-- SQL Server goes back to the original position to continue the scan

SELECT * FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO

 

So if you find a lot of forwarded records in a heap that is scanned frequently, you'll want to get rid of them. There is no background process that goes through and cleans up forwarding pointers for you. However, if a rows is updated to reduce the length of the variable length column and it becomes small enough to fit back in the original position, it will move back (assuming other rows haven't filled the space the row used to occupy.) Also, if you shrink a data file, forwarding pointers can be removed, but this is not recommended as a way to remove forwarding pointers, it is just a side-effect.

Since forwarding pointers can only occur in heaps, you can get rid of them by creating a clustered index on the table.

In addition, SQL Server 2008 allows us the option to rebuild tables. The motivation for this feature is to allow us to change the compression setting for a table, but we can use the REBUILD option without defining compression. So I can remove all forwarding pointers from my table as follows:

ALTER TABLE dbo.Details REBUILD;
GO

SELECT object_name(object_id) as Object, page_count,                                          avg_page_space_used_in_percent, forwarded_record_count
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks'),
  object_id ('dbo.Details'),  null, null,  'DETAILED');
GO

Notice that I still have a much larger number of pages than I did originally (2379 instead of 811) but, after all, the rows are all much larger! More importantly, there are NO forwarded records. If I do another table scan, the number of logical reads should be only the same as the number of pages.

SET STATISTICS IO ON
GO
SELECT * FROM dbo.Details;
GO
SET STATISTICS IO OFF
GO

I am recommending to my clients that any scripts that check for fragmentation values for the purpose of rebuilding an index, also inspect the forwarded record count for heaps.

~Kalen

Published Wednesday, November 11, 2009 6:31 PM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Uri Dimant said:

Hi Kalen

Thanks for that great post. You are talking about forwarding pointers and hot it impacts  on table scan operations. What If I have only INDEX SEEK operaton, SQL Server can get to any row easily no matter how fragmented the table is , so is it true also for forwarding pointers ?

What do you think?

November 12, 2009 1:19 AM
 

Uri Dimant said:

Thank you.

November 12, 2009 7:00 AM
 

Brian VanCalster said:

I know you mentioned you do not intend to talk about fragmentation in this blog.  However, I am having trouble finding this answer and hoped you could help or point me to a blog that discusses this.  I would like to know if there is a benefit to rebuilding non-clustered indexes on a heap.  The heap table does not have any forwarding records because there are only int and char data types (there are not any varchar columns).  The table contains 88 million rows and is inserted into frequently.  The majority of reads to the table are Index Seek operations returning a few rows.

December 16, 2009 11:55 AM
 

Naren M said:

Hi Kalen

Thanks for that great post.

With the inputs from the post I've done small test and found that 'sys.dm_db_index_physical_stats' is not giving the correct forwarded record count where as 'DBCC SHOWCONTIG' is giving the accurate result.Following is the script used to check the above theory.

CREATE TABLE Frecords (id   INT IDENTITY,  name varchar(20), Description VARCHAR (8000))

GO

INSERT INTO Frecords values(REPLICATE('A',20),REPLICATE('ABCD',1000))

INSERT INTO Frecords values(REPLICATE('A',20),REPLICATE('ABCD',1000))

UPDATE Frecords SET Description = REPLICATE ('BCDE', 2000) WHERE id = 1

The above script will create one forward record but the DMV is not reflecting the same. Can you please explain why?

January 31, 2011 7:38 PM
 

Kalen Delaney said:

I am seeing 1 forwarded record in the sys.dm_db_index_physical_stats row, as expected.

You did not provide the parameters you were using to call the function, so I can't verify what you did.

January 31, 2011 7:56 PM
 

Naren M said:

Thanks for the update Kalen.

The mistake I've done is I've used 'LIMITED' mode in the DMV not the 'detailed'. If I use 'LIMITED' I'm not getting the accurate results.

February 1, 2011 12:32 PM
 

Kalen Delaney said:

You are getting accurate results, just not complete. If a value shows a NULL with LIMITED, it means that is something that LIMITED doesn't collect. LIMITED doesn't look at the individual pages at the leaf level, so it can't count the forwarded records.

February 1, 2011 1:20 PM
 

Naren M said:

Thanks a lot Kalen for the clarification.

February 9, 2011 9:29 PM
 

Rick Martinez said:

How many forwarded records is a lot?

March 18, 2011 4:02 PM
 

Kalen Delaney said:

Hi Rick

A lot is when you are scanning the table, and the extra page reads to following the pointers make the cost of the scan too expensive.

You might start by looking at the number of pages in the table. That is the 'cost' of a table scan. Every forwarded recorded means an extra page read during the scan. So I would start getting concerned if the number of forwarded records was more than 5% of the number of pages in the table. But as always, YMMV.

HTH

~Kalen

March 18, 2011 6:26 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement