THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Splitting a page into multiple pages

You know about page splits... if a table has a clustered index, any new row has to be inserted into the appropriate page, based on the clustered index key order. However, if the page where the new row belongs is full, SQL Server will split the page into two pages, and then put the new row on one of the now half empty (I know, I know, I should say "half full") pages.

But what if the new row is bigger than 4000 bytes (half a page) in size? I always knew that sometimes a page might have to split into three to allow big rows to be inserted, but I just found out it could be worse than that. If a page is split into two, and the new row still won't fit, one of the half full pages is split again. But if the row is so big, it still won't fit, another split could occur. And if the row is still too big.... well, you get the idea. One of the SQL Server engineers at Microsoft made note of what he called a 'corner case' where one page split into 17!

I read about that right before I shut down my machine one night, but I couldn't stop thinking about it. I woke up early the next morning, and my first thought was, "Why am I such a geek?" But I am what I am, and I was still thinking about a 17 way split. I wondered if I could duplicate that...So I got up and started trying. I spent several hours on it, but the most I could get was one page splitting into 10. It's always good to have something to work towards. Maybe next time I'm on a long plane flight (see my next year's schedule for possibilities), I'll try some more.

So here's how I did it. I probably should have called the table Sybil, but I called it split_page instead.

You can use any database; I used tempdb.

-- Create the table
USE tempdb;
GO
SET NOCOUNT ON
GO
IF EXISTS (SELECT * FROM sys.tables
            WHERE name = 'split_page')
    DROP TABLE split_page;
GO
CREATE TABLE split_page
(id int identity(0,2) PRIMARY KEY,
id2 bigint default 0,
data1 varchar(33) null,
data2 varchar(8000) null);
GO

-- fill page until no more rows fit
INSERT INTO split_page DEFAULT VALUES;
GO 385

-- verify that there is only one data page
DBCC IND(tempdb, split_page, -1);
GO

DBCC IND should return one IAM page (PageType = 10) and one data page (PageType = 1). If you want to see how full the page is, you can run DBCC PAGE and look at just the header.  For your third parameter to DBCC PAGE, you should use the number for the data page that DBCC IND returned.

DBCC TRACEON(3604);
DBCC PAGE(tempdb, 1, 177, 0);
GO

The header has a value it it called m_freeCnt, and it told me there were only 11 free bytes left on the page, which is not enough for another one of these rows. (The INT primary key is always 4 bytes, and the BIGINT is always 8, so even without any overhead bytes, we need more than 11.)

-- Now insert one more row, this time filling the VARCHARs to the maximum length.
SET IDENTITY_INSERT split_page  ON;
GO
INSERT INTO split_page (id, id2, data1, data2)
      SELECT 111, 0, replicate('a', 33), replicate('b', 8000);
GO
SET IDENTITY_INSERT split_page  OFF;
GO

When you look at DBCC IND again, you should see that the table now has 10 data pages! (Plus one IAM page, and one index page for the clustered index root, which has PageType = 2)

DBCC IND(tempdb, split_page, -1);
GO

Here's my output:

image

(I really don't know how to make the screen shot any clearer. If you click on it, it should open in a browser window by itself and be a little easier to read. If I try to enlarge it when pasting it here, it becomes very blurred.)

Have fun!

~Kalen

Published Saturday, December 20, 2008 4:45 PM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Mike Suarez said:

Hi Kalen,

Very interesting post.

What I am having trouble with, is why the algorithm for splitting a page works this way. Why does it have to be a 50/50 page split, followed by a 50/50 page split, followed by a 50/50 page split... etc? Why can't it just say, right off the bat, this new row is going to be 7K, so a 50/50 split won't cover it... lets do a 90/10 split and leave enough room on the page so we only have to split once.

I am sure there is a very logical explanation. Maybe the overhead involved in making the decision on how to split the page is more costly than i am giving it credit for, and when you have to assume that overhead for every page split, it may be worth allowing those rare multi-page splits to avoid the overhead. If you have any insight, or links that may explain the reasoning for this behavior, i'd be very interested in knowing this.

Thanks,

Mike

December 22, 2008 2:42 PM
 

Kalen Delaney said:

Hi Mike

Usually I don't know the answers to the WHY questions. I would guess that it is more expensive, not just to figure out what percentage to use, but also to make sure they split the right way, so that the new row goes on the emptier page. Then they figured that it was a rare enough occurrence that it didn't justify the extra overhead. But it's just a guess...

~Kalen

December 22, 2008 6:03 PM
 

James Luetkehoelter said:

Sybil - I love it :) I wonder how many readers are too young to get that reference :)

December 22, 2008 8:06 PM
 

Kalen Delaney said:

Hi James

I did post the wiki reference for Sybil, for all these young'uns.

~Kalen

December 22, 2008 8:13 PM
 

William Cleek said:

Hello Kalen!

Great post.  I was thinking about the WHY to the 50/50 page splits.  I think it comes down to unpredictable row sizes in the original page.  If the originating page row sizes are small and granular enough more efficient splitting could be done.  However, there's no guarantee that the space consumed on the originating page consists of rather small rows or one big row.  If the latter there is no more efficient option than 50/50.

Then I started thinking about efficiency alternatives for this type of case.  The idea of creating page classes.  Based on statistics classes of pages dedicated for rows under a certain size.  Big row page classes having the least or no split optimization.

Then, finally, I realized if the statistics bear out that most installations have small rows then it's entirely too much effort for little gain.  If your DB uses large rows in majority (why?) then you're SOL.

December 28, 2008 10:45 AM
 

Michael Zilberstein said:

After reading Kalen Delaney's post about single insert causing 10 page splits, I wanted to see those

February 5, 2009 10:05 AM
 

The Rambling DBA: Jonathan Kehayias said:

Nearly two years ago Kalen Delaney blogged about Splitting a page into multiple pages , showing how page

December 27, 2010 9:58 PM
 

The Rambling DBA: Jonathan Kehayias said:

Nearly two years ago Kalen Delaney blogged about Splitting a page into multiple pages , showing how page

December 27, 2010 10:01 PM
 

Michael Zilberstein said:

2 years ago I wrote about monitoring page splits with Extended Events . Only 2 bloggers explored Extended

April 25, 2011 4:50 PM
 

Pavel Nefyodov said:

Managed to get 1 data page splitting into 17. Kalen, will you be interested to see the code?

October 28, 2014 5:42 AM
 

Kalen Delaney said:

Hi Pavel... what version? If it's 2012 or 2014, I would love to see it!

Thanks

Kalen

October 28, 2014 1:30 PM
 

Pavel Nefyodov said:

Sorry, only SQL Server 2008 R2. SQL Server 2012 and consequently 2014 have their internal splitting mechanism revamped. it is now quite simple and elegant as far as I can see. Attainable maximum is 1 data page into 3 for 2012/2014.

October 28, 2014 7:43 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement