SQL Server 2005 supports placing read-only filegroups or read-only databases on NTFS compression. In other words, you can compress the database files in a read-only filegroup or a read-only database. This can be a very useful feature if saving disk storage is of high priority.
But what are the performance implications of using this SQL Server 2005 feature?
About a year ago, I took a look at the behavior of using NTFS compression with SQL Server 2005 RTM, and found the read-only queries to behave somewhat erratically. However, the tests were probably not the most ideal in what they covered. I used OLTP-like short queries. In fact, I used the TPC-C Order_Status and Stock_Level transactions, and bombarded a read-only database--placed on compressed files--with these short transactions. Not an ideal test setup for a close examination of the impact of NTFS compression.
With the release of SQL Server 2005 SP2, it is worth revisiting the implications of using NTFS compression with read-only databases. But instead of repeatedly running many short queries, I decided to focus on the following single reporting query:
SELECT COUNT(*), SUM(s_ytd) FROM stock;
The stock table had 10,000,000 rows. The table was 308 bytes wide, and the size of the table was about 3GB. There was no index on column s_ytd, thus the query forced a table scan.
Note that all the tests reported here were done on a machine with 16GB of physical memory.
With 14GB allocated to the SQL Server buffer pool (both max server memory and min server memory were set to 14000), the following table shows the performance difference between using NTFS compression and not using any compression:
| |
With NTFS Compression (after dropping clean buffers) |
Without NTFS Compression (after dropping clean buffers) |
Avg. Table Scan Duration (sec) |
~ 445 |
~ 22 |
In this case, when the database files were compressed, the query took about 20 times as long to complete. That certainly is a huge performance degradation. But the story is a bit more complex than that. At one point, I re-ran the tests with 6GB allocated to the SQL Server 2005 instance (as opposed to 14GB), and the results are as follows:
| |
With NTFS Compression (after dropping clean buffers) |
Without NTFS Compression (after dropping clean buffers) |
Avg. Table Scan Duration (sec) |
~ 12 |
~ 22 |
Wow! It's actually faster to run the query on a compressed database than it was on a non-compressed database. This was the case even when the SQL Server 2005 instance was restarted.
But remember that the server had 16GB physical memory. With 6GB given to the SQL Server instance, 10GB was left for other processes and the file system cache. Apparently, the data was cached outside of the SQL Server instance when it was being decompressed. So even when the buffer pool was emptied either by running DBCC DROPCLEANBUFFERS or via a SQL Server restart, SQL Server was still able to get the data from memory instead of disk. This means that if the data was removed from memory, the same query should take significantly longer to complete, and that was indeed the case, as shown in the following table:
| |
With NTFS Compression (after a server reboot) |
Without NTFS Compression (after dropping clean buffers or a server reboot) |
Avg. Table Scan Duration (sec) |
~ 464 |
~ 22 |
A final note. With NTFS compression, neither the disk bandwidth nor the processor capacity was observed to be fully utilized. The test database had a single data file and a single log file. I didn't check whether splitting them into multiple database files would help with the decompression performance.