THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

Why does SQL Server not compress data on LOB pages?

This blog has moved! You can find this content at the following new location:

https://SQLServerFast.com/blog/hugo/2013/01/why-does-sql-server-not-compress-data-on-lob-pages/

Published Thursday, January 31, 2013 11:08 AM by Hugo Kornelis

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Roger Plowman said:

Thanks!

January 31, 2013 8:53 AM
 

a.m. said:

I think a much simpler answer is: "because the SQL Server team didn't implement the feature due to it not meeting some internal priority bar." It absolutely *could* be done -- there's no major technical barrier -- and it probably would by very useful for a lot of people. But they just didn't do it.

January 31, 2013 10:31 AM
 

Hugo Kornelis said:

@Adam: Of course it can be done. But can it be done without having a big adverse effect on performance?

All the algorithms I know for effective compression of a single stream of bytes work on the basis of either a dictionary that is explcitly stored as part of the compressed data (e.g. Lempel–Ziv–WelchHuffman), or that is kept in memory only and can be reconstructed from the compressed data (e.g. Lempel–Ziv–Welch). Both categories have problems with SUBSTRING and chunked updates (the .WRITE method of the UPDATE statement).

The first category always needs to read the dictionary in addition to the pages where the required data for a SUBSTRING lies, so more IO is needed for a SUBSTRING. It might even be necessary to read and decompress everything to the left of the substring, because I don't think it's possible to predict where exactly the 80,000th character of a compressed varchar(max) starts. For .WRITE, the same problem exists. And after too many .WRITE updates, the compression ratio will drop unless the dictionary is recalculated and the entire string is de- and then recompressed. Plus, what happens if I use .WRITE to replace a few character, but the replacement data doesn't compress as well as the original data?

In the second category, it is even completely impossible to decompress a substring without first processing all the preceding bytes. And changing a chunk changes the dictionary, so any change would force the engine to de- and recompress everything that follows the changed part. Using .WRITE to replace a few characters near the start of a 750MB string can become very costly that way!

January 31, 2013 12:26 PM
 

Peter said:

Rather than compressing a 750MB string as a single stream, it would probably better to compress it in separate 1MB chunks which would then mean that SUBSTRING & .WRITE would only have to decompress/recompress a smaller section of the data.

February 1, 2013 4:32 AM
 

Andrew said:

@Peter excellent idea. Combined with fast algorithm, e.g. LZ4 (http://code.google.com/p/lz4/), can yield substantial I/O savings.

@Hugo I think compression is not for everybody. If you have an app that often updates 750MB row of data, then of course it's best to avoid compression. But SQL should support compression in scenarios where .WRITEs are used rarely or not used at all.

So, we have cloud backup in SP1 CU2, may I expect LOB compression in SP1 CU3? :-)

February 2, 2013 1:12 PM
 

a.m. said:

Agreed with Andrew. The .write method is not especially common; I'm not sure I've seen it used more than one or two times. Make it a limitation of the feature and 95+% of people storing LOB data will be perfectly happy.

February 3, 2013 12:09 PM
 

tobi said:

The fact is that some scenarios would be conveniently addressed by being able to turn on a much stronger compression scheme that also works for LOB data. The disk space, buffer pool efficiency and IO savings can be worth the (enormous) CPU cost.

SQL Server doesn't have this feature today but I hope they someday get around doing it. Maybe they are waiting until they have variable page size implemented so they can compress 1MB chunks at once.

February 3, 2013 4:15 PM
 

tobi said:

With a page size of >1MB one could use the same compression algorithm that is already used for backups. Backups would then not even need to compress again.

February 3, 2013 4:32 PM
 

TheSQLGuru said:

Small-chunk compression is the foundation of Column Store Indexing, which we know is very efficient and provides dramatic IO reduction and thus performance increases (even without Batch Mode Execution).  And like Adam I too have hardly EVER seen user apps doing LOB updating in the wild.

February 4, 2013 8:46 AM
 

Galaxis said:

And what about the extra time/metadata (storage for that) required to track/process these separate "1MB chunk"s, esp if your updates routinely span these chunks (very likely, since 750MB was warranted in the 1st place)?

November 8, 2014 4:08 PM

Leave a Comment

(required) 
(required) 
Submit

About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.
Privacy Statement