THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance Impact of Using NTFS Compression with Read-Only Databases

SQL Server 2005 supports placing read-only filegroups or read-only databases on NTFS compression. In other words, you can compress the database files in a read-only filegroup or a read-only database. This can be a very useful feature if saving disk storage is of high priority.

But what are the performance implications of using this SQL Server 2005 feature?

About a year ago, I took a look at the behavior of using NTFS compression with SQL Server 2005 RTM, and found the read-only queries to behave somewhat erratically. However, the tests were probably not the most ideal in what they covered. I used OLTP-like short queries. In fact, I used the TPC-C Order_Status and Stock_Level transactions, and bombarded a read-only database--placed on compressed files--with these short transactions. Not an ideal test setup for a close examination of the impact of NTFS compression.

With the release of SQL Server 2005 SP2, it is worth revisiting the implications of using NTFS compression with read-only databases. But instead of repeatedly running many short queries, I decided to focus on the following single reporting query:

SELECT COUNT(*), SUM(s_ytd) FROM stock;

The stock table had 10,000,000 rows. The table was 308 bytes wide, and the size of the table was about 3GB. There was no index on column s_ytd, thus the query forced a table scan.

Note that all the tests reported here were done on a machine with 16GB of physical memory.

With 14GB allocated to the SQL Server buffer pool (both max server memory and min server memory were set to 14000), the following table shows the performance difference between using NTFS compression and not using any compression:

   With NTFS Compression
 (after dropping clean buffers)
 Without NTFS Compression
 (after dropping clean buffers)
 Avg. Table Scan Duration
 (sec)
 ~ 445  ~ 22

In this case, when the database files were compressed, the query took about 20 times as long to complete. That certainly is a huge performance degradation. But the story is a bit more complex than that. At one point, I re-ran the tests with 6GB allocated to the SQL Server 2005 instance (as opposed to 14GB), and the results are as follows:

   With NTFS Compression
 (after dropping clean buffers)
 Without NTFS Compression
 (after dropping clean buffers)
 Avg. Table Scan Duration
 (sec)
 ~ 12  ~ 22

Wow! It's actually faster to run the query on a compressed database than it was on a non-compressed database. This was the case even when the SQL Server 2005 instance was restarted.

But remember that the server had 16GB physical memory. With 6GB given to the SQL Server instance, 10GB was left for other processes and the file system cache. Apparently, the data was cached outside of the SQL Server instance when it was being decompressed. So even when the buffer pool was emptied either by running DBCC DROPCLEANBUFFERS or via a SQL Server restart, SQL Server was still able to get the data from memory instead of disk. This means that if the data was removed from memory, the same query should take significantly longer to complete, and that was indeed the case, as shown in the following table:

   With NTFS Compression
 (after a server reboot)
 Without NTFS Compression
 (after dropping clean buffers or
 a server reboot)
 Avg. Table Scan Duration
 (sec)
 ~ 464  ~ 22

A final note. With NTFS compression, neither the disk bandwidth nor the processor capacity was observed to be fully utilized. The test database had a single data file and a single log file. I didn't check whether splitting them into multiple database files would help with the decompression performance.

Published Friday, May 04, 2007 3:23 AM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Scott R. said:

You mentioned that the table was 3 GB.  Was that the compressed or uncompressed size (and what was the table size for whichever one was not stated, if it is possible to find out)?  Can you share the overall DB file size - with and without compression?  This example of compression factors will give readers a feel for what might be gained space-wise for the negative trade-off in processing time.

The obvious trade-offs to using file system compression is less disk space versus greater processing time to decompress the requested data pages.  The less obvious downside is the difficulty (additional processing time) in randomly accessing data pages in a compressed file that are varying sizes (since each page will compress differently with different content).  I expect that empty DB pages will compress very well, since they likely contain large amounts of binary zero byte values or some other common initialized value.

Another issue worth pointing out: I am familiar with a best practice to configure the NTFS file system for volumes that will host database files using 64 KB cluster (allocation unit) size at the time the volume is formatted.  The 64 KB cluster size reduces the number of clusters needed by a factor of 16, and may contribute to less disk fragmentation (since all sectors within a cluster are defragmented by definition).  Be aware that NTFS cluster sizes above 4 KB do not support file system compression.

Your feedback will give better informed inputs to physical DB design planning processes on the merits and downsides of using file system compression for read-only DB files.  Thanks for your efforts.

May 4, 2007 12:24 PM
 

Linchi Shea said:

The table size of 3GB was the uncompressed size. The database had a single data file and a single log file. The data file was 10GB uncompressed and ~8GB compressed. The log file was 5GB uncompressed and ~3GB compressed.

It took about 20 minutes to compress the data file, and 12 minutes to compress the log file. And it took about 14 minutes to uncompress the data file and 6 minutes to uncompress the log file.

The databse had a few other tables, two of which were about 2GB each and the others were trivial in size.

May 5, 2007 12:10 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement