<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'Compression' and 'Internals'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Compression,Internals&amp;orTags=0</link><description>Search results matching tags 'Compression' and 'Internals'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Why does SQL Server not compress data on LOB pages?</title><link>http://sqlblog.com/blogs/hugo_kornelis/archive/2013/01/31/why-does-sql-server-not-compress-data-on-lob-pages.aspx</link><pubDate>Thu, 31 Jan 2013 09:08:03 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47406</guid><dc:creator>Hugo Kornelis</dc:creator><description>&lt;p&gt;Enabling compression on your database can save you a lot of space – but when you have a lot of varchar(max) or nvarchar(max) data, you may find the savings to be limited. This is because only data stored on the data and index pages is compressed, and data for the (max) data types is generally stored on other, special-purpose pages – either text/image pages, or row overflow data pages. (See &lt;a href="http://msdn.microsoft.com/en-us/library/ms190969%28SQL.105%29.aspx"&gt;Understanding Pages and Extents&lt;/a&gt; in Books Online). This is from the SQL Server 2008R2 Books Online, but it is still valid in SQL Server 2012 – but apparently, this page has been removed from newer Books Online editions).&lt;/p&gt;  &lt;p&gt;So why does SQL Server not compress the data that, perhaps, would benefit most from compression? Here’s the answer.&lt;/p&gt;  &lt;p&gt;SQL Server currently supports two compression methods for data in the database (backup compression is out of scope for this post).&lt;/p&gt;  &lt;p&gt;* Row compression: This is a simple algorithm to save storage space for individual rows. It has two elements. The first is a more efficient way to store the per-row metadata, saving a few bytes per row regardless of layout and content. The second element is storing almost all data types, even those that have a fixed length, as variable length. This mainly has benefits for the larger numerical types (e.g a bigint with a value of 1,000 is stored in two bytes instead of eight – only values that actually &lt;i&gt;need&lt;/i&gt; all eight bytes do not gain from this, and will instead take up more space because the actual length has to be stored somewhere) and for fixed-length string types with lots of trailing spaces. For Unicode data, the SCSU algorithm is used, which saves 15% to 50% depending on the actual content of the column. (According to &lt;a href="http://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode"&gt;Wikipedia&lt;/a&gt;, the SCSU standard has gained very little adoption because it is not as effective as other compression schemes).&lt;/p&gt;  &lt;p&gt;See &lt;a href="http://msdn.microsoft.com/en-us/library/cc280576.aspx"&gt;Row Compression Implementation&lt;/a&gt; and &lt;a href="http://msdn.microsoft.com/en-us/library/ee240835.aspx"&gt;Unicode Compression Implementation&lt;/a&gt; in Books Online.&lt;/p&gt;  &lt;p&gt;* Page compression: When enabled, page compression is done *after* row compression. As the name implies, it's done on a per-page basis. It consists of two steps:&lt;/p&gt;  &lt;p&gt;1. Prefix compression. Within each column, the longest common prefix is used to build the &amp;quot;anchor record&amp;quot;. All columns than only indicate how many characters of the anchor value they use as prefix. So for example, if we have a first name column with the values Roger / Hugo / Hugh, the anchor value could be Hugh, and the data values would be stored as {0}Roger / {3}o / {4}. (Here, {3} is stored as a single byte, and {3}o means: first three characters of Hugh, followed by an o).&lt;/p&gt;  &lt;p&gt;2. Dictionary compression. Accross the entire page, columns that are now stored with the same bit pattern are replaced with a single value that points to the dictionary entry. Let's assume that the same page I use above also has a Lastname column, with values Plowman / Kornelis / Ploo. Here, Plowman would be the anchor value, and the data after prefix compression would be {7} / {0}Kornelis / {3}o. The dictionary encoding would then see that there is a {3}o in the population of the Firstname columnm and a {3}o in the population of the Lastname column. It would place {3}o as the first entry in the dictionary and replace both {3}o values with the reference [1].&lt;/p&gt;  &lt;p&gt;See &lt;a href="http://msdn.microsoft.com/en-us/library/cc280464.aspx"&gt;Page Compression Implementation&lt;/a&gt; in Books Online.&lt;/p&gt;  &lt;p&gt;All elements of page compression save space by eliminating repeated data between different column values, so they will only work when multiple values are stored on a page. For all LOB pages, the reverse is the case: a single value spans multiple pages. So by definition, page compression can never yield any benefits.&lt;/p&gt;  &lt;p&gt;For row compression, the more efficient storage of per-row metadata naturally only affects pages that have per-row metadata stored – data and index pages, but not LOB pages. And the conversion of fixed length to variable length data types also doesn’t affect LOB pages, since these can only be used for varying length data.&lt;/p&gt;  &lt;p&gt;Based on the above, it is obvious why SQL Server does not compress varchar and varbinary data stored on LOB pages – there would be zero benefit from any of the implemented compression methods. But how about Unicode compression for nvarchar(max) and overflowing nvarchar(&lt;i&gt;nnn&lt;/i&gt;) data? Wouldn’t that save some space?&lt;/p&gt;  &lt;p&gt;To answer that, I now have to go into speculation mode. And I see two possible theories:&lt;/p&gt;  &lt;p&gt;1. Because the SCSU standard saves less spacing than other algorithms, the SQL Server team deliberately made this choice in order to encourage people to compress these large values in the client before sending them to the server, thereby reducing not only storage space (by more than SCSU would have yielded), but also network traffic. The down side of this is that cool features such as &lt;a href="http://msdn.microsoft.com/en-us/library/ms142571.aspx"&gt;Full-Text Search&lt;/a&gt; and &lt;a href="http://msdn.microsoft.com/en-us/library/gg492075.aspx"&gt;Semantic Search&lt;/a&gt; don’t understand data that was compressed at the client – at least not without &lt;a href="http://blogs.msdn.com/b/sqlfts/archive/2011/07/13/getting-a-custom-ifilter-working-with-sql-server-2008-r2-ifiltersample.aspx"&gt;a lot of extra effort&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;2. Since all compression algorithms work on a per-page basis, they had a choice between either first breaking the LOB data into pages and then compressing (which makes no sense, as the rest of the page would remain empty and the amount of space actually used remains the same) or creating a separate algorithm for LOB data to first compress it and then split it over multiple pages. That would of course have cost a lot of extra engineering hours, and if my understanding of SCSU is correct, it would also have a big adverse side effect on operations that affect only a part of an nvarchar(max) value (like &lt;a href="http://msdn.microsoft.com/en-us/library/ms187748%28SQL.105%29.aspx"&gt;SUBSTRING&lt;/a&gt; or the &lt;a href="http://msdn.microsoft.com/en-us/library/ms177523%28SQL.105%29.aspx"&gt;.WRITE method of the UPDATE statement&lt;/a&gt;). That is because SCSU works by traversing the entire string from left to right and can’t handle operating on only a subset of the string.&lt;/p&gt;  &lt;p&gt;Bottom line: When you have to store large values and you want to save on storage size, your best course of action is probably to compress and decompress the values on the client side. But do beware the consequences this has for Full Text Search and Semantic Search!&lt;/p&gt;  &lt;p&gt;Final note: I didn’t spend as much time on this blog post as I normally do. That’s because this actually started as a reply to a question on an internet forum, but when I was busy I realized that the reply was long enough to be promoted to a blog post.&lt;/p&gt;</description></item><item><title>Did You Know? I gave two presentations last week</title><link>http://sqlblog.com/blogs/kalen_delaney/archive/2010/06/14/-two-presentations-last-week.aspx</link><pubDate>Mon, 14 Jun 2010 15:05:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:26183</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;P&gt;Even though I didn't make it to TechEd this year, it didn't mean I was quiet last week. On Wednesday, I was in Colorado, giving a talk for the new Colorado PASS User Group, which is a joint venture between 3 different existing groups from Colorado Springs, Denver and Boulder. On Saturday, I spoke at SQL Saturday #43, in Redmond on the Microsoft campus. My presence there has already been mentioned on two other blogs here at SQLBlog: &lt;A href="http://sqlblog.com/blogs/merrill_aldrich/archive/2010/06/12/outstanding-sql-saturday.aspx"&gt;Merrill Aldrich&lt;/A&gt; and the infamous &lt;A href="http://blogs.msdn.com/b/buckwoody/archive/2010/06/14/sql-saturday-43-redmond-wa-review.aspx"&gt;Buck Woody&lt;/A&gt;. &lt;/P&gt;
&lt;P&gt;As Merrill mentioned, my session was on compression ("The Compression Session", isn't that catchy?). Right before lunch, I ran into an old friend, who asked what &lt;STRONG&gt;&lt;EM&gt;exactly&lt;/EM&gt;&lt;/STRONG&gt; my session would cover.&amp;nbsp; I explained I would talk about the compression algorithms, and how exactly the data in your data files was compressed. He said… "Oh, you're going to talk about the &lt;EM&gt;Internals&lt;/EM&gt; of Compression"… like he was surprised. I guess I'll have to send him a copy of my latest book… SQL Server 2008 &lt;EM&gt;Internals.&amp;nbsp; &lt;/EM&gt;I actually didn't cover the deepest internals that I could have; I did have some extra slides showing the details of the new row formats used if you choose to compress your data, but we didn't have any extra time.&amp;nbsp; The details are in my book, if you're interested. &lt;/P&gt;
&lt;P&gt;As mentioned, I discussed primarily the compression algorithms. I think my friend was asking what my specific topic was because he wanted to know if I was going to discuss performance and best practices at all. He then shared a tidbit that he had just heard from inside Microsoft (he used to work for Microsoft, for many long years) and that is that the consulting teams have pretty much discovered that the benefit of compression almost always outweighs the cost, and for most large systems they are recommending turning on page compression right out of the box. Remember that compression is a Enterprise Edition feature (and it's mainly the larger systems that are using Enterprise Edition).&amp;nbsp; I'm just passing on this comment, I haven't done enough testing of my own to make a personal recommendation. &lt;/P&gt;
&lt;P&gt;The demos have been posted for both the Denver PASS and the SQL Saturday sessions at: &lt;BR&gt;&lt;A href="http://www.sqlserverinternals.com/conferences"&gt;www.sqlserverinternals.com/conferences&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color=#ff00ff size=4&gt;~Kalen&lt;/FONT&gt;&lt;/P&gt;</description></item></channel></rss>