In my previous post on data compression, I looked at how rebuilding a table with page compression works with multiple processors via the MAXDOP option. In this post, I'll focus on what compression ratios I have seen in the real-world databases. Now, if you understand how SQL Server 2008 data compression works, you know that what compression ratio you may get really depends on to what extent SQL Server can find duplicate values on the data/index pages. Consequently, one real-world database can have a very high compression ratio while another may have a very low compression ratio: there is no typical compression ratio to expect.
That said, it is still interesting to report what compression ratios I have seen with real customer databases instead of 'cooked' data. Before I give the results, let me define what I mean by compression ratio:
Compression Ratio = Uncompressed Size / Compressed Size
To see what kind of compression ratios I may run into with real-world databases, I randomly selected 14 customer databases. IN this blog post, these databases have been renamed to DB1, DB2, ..., DB13, and DB14 to shield their identities. For each database, the following steps were performed:
- Record the data size of sp_spaceused for the database. This is the uncompressed data size.
- For each user table in the database, rebuild it with page compression. This was done by generating and executing a script that contains the following ALTER TABLE statement for each table:
alter table <the_table> rebuild with (data_compression=page, maxdop=8);
- Record the data size of sp_spaceused for the database. This is the compressed data size.
- For each user table in the database, rebuild the table with no compression, which effectively decompresses the table. This was done by generating and executing a script that contains the following ALTER TABLE statement for each table:
alter table <the_table> rebuild with (data_compression=none, maxdop=8);
- Record the data size of sp_spaceused for the database. This is the decompressed data size.
The data size obtained in Step 3 was used as database compressed size, and the data size obtained in Step 5 was used as the database uncompressed size. The data size obtained in Step 1 was not used because it may be polluted by fragmentation. If we used the data size from Step 1, we could potentially be using a size that was inflated by fragmentation, resulting in an incorrectly higher compression ratio.
The following two charts summarize the results:
The top chart includes the three database space measures mentioned above. As you can see, fragmentation was not a significant factor for these databases since the uncompressed size and decompressed size were quite close for each database. As it turned out, these databases were regularly maintained, including rebuilding clustered indexes, it's no surprise that there wasn't any serious fragmentation.
To highlight the compressed ratios for these databases, the bottom chart plots the compression ratios. So for these databases, the compression ratios--obtained using SQL Server 2008 CTP6 page compression--vary between 1.7 and 5.6.
The space savings were substantial with these databases. When decompressed, the total space usage was ~261GB. Compared that to the total space usage of ~88GB when compressed. In other words, SQL Server 2008 page compression would save us 173GB in storage space.