THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Aaron Bertrand

Aaron is a senior consultant for SQL Sentry, Inc., makers of performance monitoring and event management software for SQL Server, Analysis Services, and Windows. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at user group meetings and SQL Saturday events.

SQL Server 2008 R2 : A quick experiment in Unicode Compression

Fellow MVP Simon Sabin blogged today about one of the few engine enhancements we'll be seeing in SQL Server 2008 R2 : Unicode compression. You can read more about the topic in Books Online, but basically what is going to happen is that NCHAR / NVARCHAR (but not NVARCHAR(MAX)) columns, in objects that are row- or page-compressed, can benefit from additional compression, where realistically you can cut your storage requirements in half, depending on the language / character sets in use.

But I'm the kind of guy who has to see it to believe it!  So, I mocked up a quick test of storing some Finnish Danish/Norweigan (?) characters, and ran it both on a SQL Server 2008 instance (SP1 + CU3, 10.0.2723), and a SQL Server 2008 R2 instance (10.50.1092). 

USE tempdb;
GO
SET NOCOUNT ON;
GO
CREATE TABLE dbo.test(foo NVARCHAR(2048)) WITH (DATA_COMPRESSION ROW);
GO
INSERT dbo.test(fooSELECT RTRIM(RAND()) + REPLICATE(N'øååøæ'300);
GO 50000
EXEC sp_spaceused N'dbo.test';
GO
DROP TABLE dbo.test;
GO

SQL Server 2008 results:

 

SQL Server 2008 R2 results:

 

The difference is astounding: a space savings of roughly 60%, FOR FREE.  That's right, this is an enhancement you get just by upgrading... I did not do anything differently about the creation of these tables except continue to use compression.  (And note that I also performed this test with page compression, and the results were identical all around.)  Keep in mind that if you upgrade at some point (you can't upgrade to R2 in its present form), you will need to rebuild indexes in order to implement this new compression method across the entire table.

And of course, by "free," I am not talking about the licenses.  As with row and page compression, this feature is only available in Developer, Enterprise, and Enterprise Evaluation editions.

Now, I'll admit, the test is not super-realistic, and is biased toward good compression (since the same pattern is repeated over and over again on every row and on every page), and as such it demonstrates something pretty close to best-case scenario.  But even worst-case scenario is not exactly "bad" -- you may not see a gain at all, but you can't lose anything either, because the compression algorithm in SQL Server 2008 is smart enough to know when it is actually going to *lose* space by implementing compression, and won't do it in that case.

At some point I will test the performance of writing, reading and seeking against a Unicode compressed table (and I will come up with more plausible test data at that point).  Because nothing is ever really free, is it?  Stay tuned to find out.

Published Tuesday, August 11, 2009 3:17 PM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Denis Gobo said:

That is a nice storage saving, unfortunately I almost use no Unicode at all at the moment

August 11, 2009 2:46 PM
 

AaronBertrand said:

I use a lot of Unicode, since we have to support several foreign languages and symbols like Euro and pound.  All of this data gets entered via a Web UI.  Unfortunately, while about 1/3 of these columns are NVARCHAR(64 or 255), the rest are NVARCHAR(MAX), which won't benefit from this compression at all... even when the data is stored in-row.

August 11, 2009 2:58 PM
 

Denis Gobo said:

<Sarcasm>

Mmmm, that 255 number sounds familiar, did you upgrade this from Access with the upgrade wizard?</Sarcasm>

I do have some unicode but it is in lookup tables, for example  道琼斯第一财经中国600指数

August 11, 2009 3:03 PM
 

AaronBertrand said:

255 came at us in a few cases as a "requirement"... wherever 64 was not enough, they pushed for 255.  Their magic number, not mine.  :-)  And the 64, well, that was legacy... it was there long before I ever got my hands on the schema.

August 11, 2009 3:08 PM
 

Linchi Shea said:

Denis;

> unfortunately I almost use no Unicode at all at the moment

But that means you are saving space already :-)

August 11, 2009 3:10 PM
 

Adam Machanic said:

One issue is that compression brings a lot of CPU overhead; in my tests I've seen up to 2x the amount of time required for retrieval of hot pages. I wonder if UTF8 support would be just as beneficial without having as much overhead?

August 11, 2009 3:22 PM
 

AaronBertrand said:

Agreed, I am definitely interested in seeing how this affects performance and if there is any noticeable impact over and above page/row compression.  I think it all depends on the nature of the data and the decisions the engine has to make as it compresses/decompresses.

In the long run, we're I/O-bound, not CPU-bound, so we're probably better off getting any I/O gain we can, even if it costs us a little CPU.  YMMV.

August 11, 2009 4:25 PM
 

Aaron Bertrand said:

When SQL Server 2008 R2 was first released, I wrote a quick blog post discussing the additional space

August 23, 2009 6:05 PM
 

Aaron Bertrand said:

Background Several weeks ago, I ran some tests on the new implementation of Unicode Compression in SQL

October 4, 2009 9:46 PM
 

Aaron Bertrand said:

I see a lot of people trash-talking the EAV (entity-attribute-value) model. If you want the full story

November 19, 2009 6:52 PM
 

Aaron Bertrand said:

In my last post in this series , I talked about problems associated with creating (and using) what I

March 8, 2010 10:52 PM

Leave a Comment

(required) 
(required) 
Submit

About AaronBertrand

...about me...

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement