THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Tibor Karaszi

How often do you rebuild your heaps?

Never? You are not alone. None of the maintenance solutions I use includes functionality to rebuild a heap, and that includes Maintanance Plans and Ola Hallengren's maintenance solution.

"So why would you want to rebuild a heap, when it isn't sorted in the first place?", you ask. The answer is to get rid of Forwarding Pointers, and these can really hurt performance by adding lots more logical and physical reads, and random I/O. See for instance this from Kalen Delaney, this from Hugo Kornelis and this from the SQLCAT team.

SQL Server 2008 added the ALTER TABLE command, with the REBUILD clause. And this is what I'm using in my procedure rebuild_heaps which rebuilds all fragmented heaps on a SQL Server.

You find the procedure here: http://www.karaszi.com/SQLServer/util_rebuild_heaps.asp.

Published Thursday, March 06, 2014 1:46 PM by TiborKaraszi
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Greg Linwood said:

The fact that a HEAP has lots of forwarding pointers isn't a conclusive reason to rebuild a HEAP.

If rows are only accessed individually from the HEAP (how HEAPs should only ever be used as there is no ordering in a HEAP) there is little damage from the forwarding pointers - an I/O or two extra per row lookup is not much different from a few extra I/Os in a B-Tree lookup

Rebuilding HEAPs can be very expensive, particularly as the NCIXs have to be rebuilt as well due to the reconstruction of ROWIDs. This in turn can have a harmful impact on TLogging, disk space & remote DR synchronisation.

It might also be worth pointing out that forwarding pointers only happen in tables with variable width columns that actually get updated with "widening" updates, and then only where the widening update can't be performed within the available space on the page. In a well designed physical database, this shouldn't happen en mass & when it does, the design should be questioned.

March 6, 2014 8:03 PM
 

TiborKaraszi said:

Hey Greg!

In many cases, maintenance routines is all about generalizations. You can tailor a maint implementation to an installation when you are at that installation, and if you have the knowledge and time.

As a *general* statement, I don't see the harm in rebuilding fragmented heaps, but you of course have to take in considerations the effect on tlog backups and HA/DR solutions depending on the tlog (just as for rebuilding b-tree indexes).

My script doesn't rebuild all heaps, it does check for how much "fragmentation" (as I call forwarded records) you have and do or don't do the rebuild based on a specified cut-off percentage.

March 7, 2014 3:12 AM
 

RichB said:

I take it this handles the bloat from deletes in the heap?  

Something I tend to find much more destructive than fragmentation/forwarding.

March 7, 2014 4:14 AM
 

TiborKaraszi said:

No, Rich, that is something I didn't consider! Interesting. Shouldn't be difficult to add that logic to the procedure. Thinking out loud here...

I guess we would use the avg_page_space_used_in_percent, and if higher than @rebuild_if_free_space_percent_is_greater_then then we do rebuild. I would prefer to not complicate the interface to the proc - ideally add only one more parameter. I think working with respective cutoff-value (for fwdptr and free space) is enough to control whether to rebuild based on only fwd or only free space or both.

March 7, 2014 10:07 AM
 

Hugo Kornelis said:

Greg: On the surface, an RID lookup in a heap may look cheaper than a Key Lookup in a clustered index. A lookup in the CI typically costs three or four logical reads (root page, intermediate pages, data page); a lookup in a heap is either one or two (data page; forwarding pointer to other data page). In the worst case, the averate number of logical reads for this RID lookup approaches two per row.

For the ideal lookup queries that only need a few dozen rows, these numbers are totally correct - but frankly, I don't care about these figures in the cases where the row count is just a few dozen, they will not be my performance bottleneck anyway.

But how about a nonclustered index seek with range predicate, or a nonclustered index scan, that returns thousands of rows that then all go into a lookup? Either becuase the optimizer messed up, or because the table is so large that a table scan is still more expensive? The math for the logical reads remains the same - but add memory pressure on the data cache, and some of those logical reads will become physical reads.

For the Key Lookup, the root and intermediate pages will be read once and then stay in cache, becuase they get hit all the time; the data pages can be flushed out. Worst case is one physical read for each row.

For the RID Lookup, no pages are hit more often then others, so they all have the same chance of getting flushed out. Worst case is one physical read for each logical read. Which, as mentioned before, approaches 2 per row in the worst case scenario. So now, the RID lookup has less logical reads - but more physical reads!

The situation gets worse with a scan. In my blog post that Tibor links to, I show that even a moderate amount of updates to varying lenth columns (5,000 updates in a 20,000 row table) can result in so many forwarding pointers that a table scan has over 10 logical reads for each page in the table. Again, add memory pressure and those all become physical reads.

In a heap with no forwarding pointers at all, a table scan can be extremely fast because the rows are all scanned in physical allocation order, minimizing the movement of the disk I/O heads. But every forwarding pointer breaks the flow of that pattern. So the presence of forwarding pointers not only increases the amount of physical reads, it also changes them from fast sequential reads to slower random reads.

My first guideline has always been, and still is, that every table should have a clustered index - unless you have very good reasons why a specific table is an exception.

My second guideline is that for each table that does not have a clustered index, regular rebuilds should be included in the maintenance plan - again, except when you have very good reasons why that table is the exception.

March 8, 2014 6:45 AM
 

Greg Linwood said:

Hi Hugo

It actually sounds like you're agreeing with me. RID lookups are cheaper always when where is sufficient memory but sometimes might cause more physical I/O if there is insufficient memory. And this only for tables stored on HEAPs that have variable width columns (many tables don't).

The argument about memory sizing is becoming increasingly less relevant as memory these days is MUCH more available than years ago so bringing memory pressure scenarios into this topic doesn't carry as much weight as it might have years ago.

It is not correct to generalise that all HEAPs should be routinely defragged or rebuilt as the benefits are doubtful and there is also harm that comes from doing so.

If there was no damage from rebuilding HEAPs (or IXs) there may be no problem with your position but rebuilding tables / indexes is one of the most common causes of SQL Server outages because log files fill up (sometimes corrupting DBs) and DR data synchronisation issues flow through the network to also fill up disks on standby servers.

It is also not easy to predict the space required to rebuild tables / heaps due to the logging overhead which is proportional to the average row size of the table / index being rebuilt. With narrow indexes / tables, this can be very disproportionate & tricky for DBAs to calculate. I've never seen an index maintenance script do this well

March 9, 2014 1:18 AM
 

TiborKaraszi said:

Rich,

I have now added option for rebuild based on free space in the heap pages.

March 21, 2014 1:20 PM
 

Allen McGuire said:

Greg - the unfortunate reality is that I don't always get what I want (memory).  Also, I don't think Hugo was agreeing with you at all, except in the case where the RID lookup is rather small.  At the end of the day, performance will suffer if the extra I/Os are required to fulfill the request.  Paul Randal has a nice post on the topic - not covering the rebuilding of heaps, but rather highlighting the performance implications of forwarding records.

http://www.sqlskills.com/blogs/paul/forwarding-and-forwarded-records-and-the-back-pointer-size/

Obviously we don't want to generalize, but you do what's best for the "greater good" and address the special circumstances thereafter.  You don't build your maintenance solution around the exceptions.

Lastly, I don't agree index maintenance is the most common cause of outages - that's simply not accurate.  Perhaps for the wanna-be or inexperienced DBA who still uses maintenance plans, sure, but people that vaguely know what they are doing use Ola's scripts.

May 21, 2014 2:01 PM
 

Chuck Rhoads said:

Hello Tibor,

Thanks for your hard work!  This script served as a great starting point for my heap maintenance solution.

One thing I did modify though was the logic used to rebuild based on empty space.  Your script only looks at avg_page_space_used_in_percent.  This can be misleading for record sizes that can't always fill a page.  For example, what if your average record size is 4100 bytes?  Then the avg_page_space_used_in_percent would always be slightly above 50% and rebuilding would not free up any space.  The solution was to rebuild based on the deviation between the avg_page_space_used_in_percent and the theoretical maz size based on the avg record size.  My calculation is below.  Let me know if you see any flaws in my logic....

CASE

WHEN P.avg_record_size_in_bytes > 0

THEN ((FLOOR(8060/P.avg_record_size_in_bytes) * P.avg_record_size_in_bytes) / 8060) * 100 - P.avg_page_space_used_in_percent

ELSE 100

END AS Page_Space_Dev

May 29, 2014 12:22 PM
 

TiborKaraszi said:

Good thinking, Chuck. Correct, I didn't consider row size. Before I take it further, below is how I understand your calculation. Are we on the same page? (I sort of un-nested your calc from inside to out.) Also, where did you plug this in. Or rather, If you want to, I would appreciate if I could incorporate your cals into my SQL - with credit to you, of course...

DECLARE @avg_record_size_in_bytes float = 510

DECLARE @avg_page_space_used_in_percent float = 80

SELECT 8060/@avg_record_size_in_bytes AS RecordsPerPage

SELECT FLOOR(8060/@avg_record_size_in_bytes) AS RecordsPerPageInt

SELECT FLOOR(8060/@avg_record_size_in_bytes) * @avg_record_size_in_bytes AS UsablePageSpace

SELECT FLOOR(8060/@avg_record_size_in_bytes) * @avg_record_size_in_bytes / 8060 AS UsablePageSpacePercent

SELECT (FLOOR(8060/@avg_record_size_in_bytes) * @avg_record_size_in_bytes / 8060) * 100 AS UsablePageSpacePercentAsPercentage

SELECT ((FLOOR(8060/@avg_record_size_in_bytes) * @avg_record_size_in_bytes) / 8060) * 100 AS UsablePageSpacePercentAsPercentage

SELECT ((FLOOR(8060/@avg_record_size_in_bytes) * @avg_record_size_in_bytes) / 8060) * 100 - @avg_page_space_used_in_percent AS FreeSpaceForReal

May 30, 2014 11:55 AM
 

Chuck Rhoads said:

Tibor,

Yes, we are on the same page.  I'm glad that made sense to you.  Feel free to incorporate this into your code.  If you need my modified version of your script let me know.  I think you get the gist of it though.

Chuck

June 2, 2014 7:36 AM
 

TiborKaraszi said:

If OK by you, Chuck, I wouldn't mind using your modification of my script. The reason is that I don't have any handy production servers to test on now, and you never know, how careful you try to be, if you do some mistake in your code. Assuming your modification is somewhat tested... Send me an email, if you want (you find my mail address on my web-site - About me).

June 2, 2014 11:02 AM
 

Chuck Rhoads said:

No problem Tibor.  I'll email it to you tomorrow when I'm back at work.

June 3, 2014 6:35 PM
 

Chuck Rhoads said:

I emailed you with the modified version.

June 4, 2014 11:00 AM
 

Ravi said:

nice

July 16, 2014 2:53 AM
 

TiborKaraszi said:

It depends (I'm a consultant). Often along the general defrag script, which tend to be once a week.

September 13, 2014 5:22 AM
 

Mebel Jepara said:

Tibor Karaszi : How often do you rebuild your heaps?

September 25, 2014 7:16 PM
 

Furniture Jepara said:

Tibor Karaszi : How often do you rebuild your heaps?

September 26, 2014 3:52 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement