THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

SQL Server 2008 Page Compression: Using multiple processors

SQL Server 2008 has introduced a long sought after feature -- Data Compression. This is a great feature, and I have no doubt it'll be widely used. The key compression method is called page compression, which uses the following three techniques to reduce the space taken up by duplicates on a page:

  • Row compression. This technique changes the storage formats of many fixed-length datatypes (e.g. char, int, money, binary, datetime, and so on) so that they occupy only the required number of bytes plus a small overhead.
  • Prefix compression. This technique finds duplicate prefixes on a page for each column , and replaces each duplicate with a small reference number.
  • Dictionary compression. This technique finds duplicate values on a page, collects them into a dictionary stored after the page header but before the data rows, and replaces the duplicate values with their corresponding offsets in the dictionary.

You can read more about SQL Server 2008 data compression in SQL2008 CTP6 Books Online.

In this post, I'll focus on a very specific question: How does the number of processors impact rebuilding a table with page compression? Note that one way to enable page compression on a table is to rebuild it with the option data_compression set to page. The following is an example:

ALTER TABLE customer REBUILD WITH (DATA_COMPRESSION=PAGE);

Since compressing a table is a CPU intensive operation, one naturally would expect that using multiple processors should help reduce the elapsed time of the above command. How do you instruct SQL Server 2008 to take advantage of multiple processors? There exists another rebuild option called MAXDOP, which you can use to inform SQL Server of the max number of processors you want the rebuild to use. The following is an example of setting the MAXDOP option to 8:

ALTER TABLE customer REBUILD WITH (DATA_COMPRESSION=PAGE, MAXDOP=8);

To find out the impact of multiple processors (or the impact of setting MAXDOP to different values) on the elapsed time of the above command, I ran a series of tests with the command by setting MAXDOP to 0, 1, 2, 3, 4, 5, 6, 7, and 8 in turn. Before I report the test results, I should give you some information about the test setup:

  • The TPC-C customer table was used to produce all the test results reported here. However, the original customer table in the TPC-C benchmark was not very compressible because its data didn't include many duplicates. For test purposes, I 'cooked' the table to make it more compressible. After I cooked it, the data size of the table was ~2,181,864KB without data compression, and was ~439,672KB with page compression.
  • The test server had 8 processors (with four dual-core sockets), and 8GB physical memory.
  • The data point for each MAXDOP setting was collected with the following T-SQL script. While only MAXDOP = 2 is shown below, in the test, this script was repeated for each of the MAXDOP values 0 through 8.
exec sp_spaceused 'customer';
go
DBCC DROPCLEANBUFFERS
GO
declare @dt datetime;
select @dt = getdate();
alter table customer rebuild with (data_compression=page, maxdop=2);
select 'maxdop = 2', datediff(second, @dt, getdate());
go
exec sp_spaceused 'customer';
go
 

The following chart summarizes how the MAXDOP setting affects the elapsed time of rebuilding the customer table with page compression.

A number of observations should be noted:

  • SQL Server 2008 page compression can effectively take advantage of multiple processors to speed up rebuilding a table. Using more processors generally leads to shorter elapsed time.
  • The diminishing return on the number of processors is clear from the chart. In this particular test environment, after 5 or 6 processors, adding another processor to rebuilding the table with page compression only reduced the elapsed time only marginally. The largest improvement was obtained when the number of processors was increased from one to two.
  • When the MAXDOP option was set to 0, its effect was the same as when it's set to 8, which happened to be the total number of processors on the server.
  • For this test setup, changing the buffer pool size from 6GB to 2GB did not materially alter the test results.
  • With MAXDOP=8, all eight processors were seen to be used 100%--as reported by the perfmon counter Processor\% Processor Time--during the table rebuild.
Published Monday, May 05, 2008 10:59 AM by Linchi Shea

Attachment(s): maxdop.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Greg Duncan said:

Great post... thank you.

But one data point I'd like to see is the time for "ALTER TABLE customer REBUILD WITH (DATA_COMPRESSION=PAGE);" (i.e. without the MAXDOP option).

Pretty much I'm looking for what is the default number of processors used, if MAXDOP is not specified? 1? 0? ?

Again thanks for this post,

Greg

May 5, 2008 12:14 PM
 

Linchi Shea said:

That data point is dependent on the sp_configure 'max degree of parallelism' setting. Basically, if you don't set MAXDOP explicitly in ALTER TABLE REBUILD WITH(), it'll use the sp_configure setting.

May 5, 2008 1:53 PM
 

Greg Duncan said:

Got it.... Thank you.

May 6, 2008 10:05 AM
 

TheSQLGuru said:

Seems like it will be a Best Practice to limit the number of CPUs allowed if the system is active to prevent all CPUs from getting hammered.  Looks like between 2 and 1/2 available would be appropriate depending on the I/O capabilities of the server.

May 6, 2008 1:49 PM
 

Linchi Shea said:

In my previous post on data compression, I looked at how rebuilding a table with page compression works

May 11, 2008 7:37 PM
 

Linchi Shea said:

My previous post shows that data compression may not come for free, although hopefully by the RTM time

May 16, 2008 9:38 PM
 

Mike H said:

What affect, if any will compression have when taken into account with the overly large size of hard drives when trying to optimize I/O.  Do you need to re-evaluate your stripe size when dealing with compressed databases?

i.e. Ten 15,000rpm rpm drives in a Raid 0 array. You have a 150gb database, and the drives are 250gb, or even 500gb each.

Could we actually loose performance if our 150gb database now only uses 100gb on the same array?  Would we want to adjust our stripe size to improve the allocation of our databases across the disks?

May 19, 2008 1:39 PM
 

Denis Gobo said:

Wow, it has been already a year since I wrote A year in review, The 21 + 1 best blog posts on SQLBlog

December 31, 2008 10:37 AM
 

Eric said:

If you want an estimate_data_compression_savings on ALL tables within a db a the same time!!!!

---------------------------------------------------------

use [your_database_name]

go

--drop table #toto;

create table #toto(

object_name varchar(60),

schema_name varchar(60),

index_id  integer,

partition_number integer,

[size_with_current_compression_setting(KB)] integer,

[size_with_resquested_compression_setting(KB)] integer,

[sample_size_with_current_compression_setting(KB)] integer,

[sample_size_with_requested_compression_setting(KB)] integer );

exec sp_MSforeachtable

@command1 = 'Declare @tableName Varchar(63) = SUBSTRING(''?'', 8, LEN(''?'') - 8);

insert into #toto exec sp_estimate_data_compression_savings ''dbo'', @tableName, NULL, NULL, ''PAGE'';';

select a.*,  ROUND((1 -

[sample_size_with_requested_compression_setting(KB)] / CASE [sample_size_with_current_compression_setting(KB)] WHEN 0 THEN 1.0 ELSE Convert(float,

[sample_size_with_current_compression_setting(KB)]) END) * 100.0, 2) AS '% freed' from #toto as a order by 1,2,3,4;

November 23, 2010 11:19 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement