THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Backup Compression - time for an overhaul

Database backup compression is incredibly useful and valuable. This became popular with then Imceda (later Quest and now Dell) LiteSpeed. SQL Server version 2008 added backup compression for Enterprise Edition only. The SQL Server EE native backup feature only allows a single compression algorithm, one that elects for CPU efficiency over the degree of compression achieved. In the long ago past, this strategy was essential. But today the benefits are irrelevant while the lower compression is becoming a significant liability. All the third party backup compression products offer multiple levels of compression via open source algorithms and it is time for SQL Server to follow.

As always, we shall start with the historical context. LiteSpeed was introduced sometime around 2000 when the server system would have had 4 Pentium II or III processors at around 400-500MHz. This was before there were multi-processors so 4 sockets means four cores. Today, the individual processor core (Sandy Bridge to Haswell generations) is about 20 times more powerful, 6X via frequency and another 3X on instructions per cycle efficiency. And there are be 8 cores per processor socket for Intel Xeon E5 (Sandy Bridge) and 12 in the E5 v2 (Ivy Bridge) processors. Even on dropping from 4 sockets in 2000 to 2 sockets today, the system has perhaps 80 times more compute power.

In addition, the compression code also benefit from Hyper-Threading more so than SQL Server, as there are no lock contention issues. Even the Pentium 4 first generation HT, the performance gain was 50% for LiteSpeed database backup compression.

Over the same period of time, IO bandwidth has improved as well, but not by the same degree. Fourteen years ago, 350MB/s from the IO system was good (basically 2 x 2Gbit/s FC ports). Today it is possible for an IO system to deliver 10-20GB/s, except that some brain fart IO strategy in recent versions of SQL Server effectively caps practical IO bandwidth to disk storage in the 4-5GB/s.

So the net change is that today there is far more CPU available to support compression relative to the practical IO bandwidth. To better use this, we would like to use one of the algorithm that can achieve high compression with more CPU per unit of raw data. This is valuable when the data has to be sent over a wide-area network with bandwidth limits or even a local network because the infrastructure team would not put in 10GbE for the mission critical database server and has no concept of parallel network connections.

High compression can also be important when local storage space limited and extra database backups are necessary for migration operations. Of course the occurrence of constrained storage space is a separate matter of total organizational stupidity. Hard disk capacity is cheap, even for the more reliable enterprise models ($100 per TB for 7.2K HDDs and $450/TB for 10K). Yet in a SAN that is billed as helping to reduce cost (via increased utilization), somehow storage becomes precious, doled out in driblets from the SAN admin only after a series of time wasting meetings, forms and justifications. OK I am side-tracking to my long standing feud with SAN vendors bent on crippling database performance. The bottom line is that we have CPU resources on the SQL Server system, and we can put it to good use.

Compressibility of Data

Normally I like to use the TPC-H database for performance investigations because the data generator is openly available (with source code) to facilitate independent verification and because there a good library of published reports for comparison. Of course the TPC-H database is populated with randomly generated data which has different compression characteristics than normal data (less). This is because in a typical production database, there is prolific use of sequentially increasing 8-byte data types or otherwise having a limited range of values, i.e. 63 out of 64 bits are the same from one row to the next. Another cause is lack of normalization, meaning repeating values that are easily reduced with dictionary based compression. The exception of course is databases with extensive use of (non-sequential) unique identifiers, which have almost no compressibility.

That said below are some results with the TPC-H data set.

Type DB size  Compressed B/U  7Z
Uncompressed tables 14.175.663.14
Compressed tables 9.555.543.29
Clustered Column stored 4.553.502.34

With no compression at the table and index level, the SQL Server backup can achieve a 2.5 compression ratio. 7z can achieve a 4.5 compression ratio on the uncompressed backup, albeit at far high CPU consumption. With compressed tables and indexes, the database size is reduced to 67% of the original size. The SQL Server backup can achieve further compression as the block size is larger allowing for more effective dictionary use. It is perhaps no accident that both the normal database and one with compressed tables and indexes backup to about the same size. Even the 7z compressions on the uncompressed backup files are about the same size.

The clustered column store has even efficiency both in the database storage and the backup. I am thinking that the storage organization makes compression more effective. Note that the SQL Server compressed database backup is not further reduced with 7z or other common file compression utility.


This is a simpleton’s solution. It is important to differentiate between the objective and the method. Backup compression consumes CPU. When LiteSpeed first came on the market, people asked for a means to throttle CPU. The reason was to maintain some degree of responsiveness to user transactions during the backup. Full database backups are normally done at night or during off-hours if the two are different. But there might be some user activity during the backup. So it might seem reasonable to throttle the CPU used by backup compression.

However, this is why I stress the need to differentiate between the objective and method. State the true objective as accurately as possible so that intelligent people can come up with the best solution rather than constraining the developer to artificial requirements. The reason a system is unresponsive is not because the CPU is fully utilized, but because of the wait for a CPU to become available.

In the Windows operating system from the very beginning or early days, there is the quantum. On multi-processor servers, there is a time interval of 15.625 milliseconds (1/64 seconds) and the quantum is 4 times this. A processor core can compress a 1MB block (the standard backup transfer size) in well under the full quantum. Simply yield the thread after each block. If there are any user transactions, there will be minimum wait time. And any spare CPU cycles can still be used for compression with only moderate loss of efficiency versus running for the full quantum.

Of course, a good strategy might be to impose some kind of IO bandwidth control. The key element is to key disk latency low, meaning the IO queue depth should be low. It should be possible to control this with the buffer count setting. If there are 1 or 2 buffers per thread, then the IO queue should not be over saturated with IO generated by the backup process.

Published Thursday, November 7, 2013 11:29 PM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



tobi said:

Try testing with bigger 7z buffer sizes. The bigger the buffer, the farther away backreferences can be detected. It depends very much on the data set whether more buffer space helps or not. If all similar data is collocated it does not help. If it is scattered it can help significantly.

November 8, 2013 6:34 AM

Lonny Niederstadt said:

Is 1/16th second really the x86 quantum?  Seems way too long.  I thought it was 4ms.  In IBMPower AIX, default quantum is 10 ms.

November 8, 2013 6:52 AM

Lonny Niederstadt said:

When attempting to maintain low latency and consistent throughput during SQL Server backups, the HBA max transfer size is an important consideration.  Long stuck at 512k on Windows, the latest QLogic driver allows 2mb as highest max transfer, and 4 mb is allowed on Emulex.  Some testing on a system using EMC VNX storage:

Different systems may find different sweetspots; Hitachi especially is encouraging 4mb max transfer size since their storage allows a lot of bandwidth in comparison to storage controller CPU power.

November 8, 2013 7:00 AM

Greg Linwood said:

And we're also stuck with no option than having to backup indexes, which usually takes up 50% or more of the backup size / time. An option to leave indexes out of backups would half the size & time taken to backup DBs, which would need to be rebuilt during restore but far more convenient than using the index disable feature

November 8, 2013 12:57 PM

jchang said:

Can the index discard idea be effected by having the indexes not in the primary FG, then doing a FG backup and restore without the index FG? perhaps by having an empty FG?

November 8, 2013 3:42 PM

Greg Linwood said:

You can back a table only filegroup up but restoring it without its corresponding NCIX filegroup requires hacking system tables. I wrote a chapter on this in one of the MVP books a few years ago. It's possible but far harder than it should be.

November 8, 2013 8:07 PM

Robin CM said:

I am currently testing turning off backup compression altogether and storing the backups on a remote Windows Server 2012 fileserver with the deduplication feature enabled for the drive holding the backups. I'm expecting quite significant space savings based on my early experiments and published information.

Because SQL server isn't doing any compression there's no CPU hit on that front, and the dedupe processing is done on the remote fileserver.

November 11, 2013 5:14 AM

jchang said:

The point of this was that today we probably have more CPU for compression than IO bandwidth to storage, more so on the write to the backup location, along with a very possible shortage of space thanks to the SAN admin.

I do agree that de-dup is a good idea that has to be done before compression. However it is almost certain that you will have even less network bandwidth to transfer the raw backup than disk bandwidth (unless the servers are connected by a fat Infini-band connection, perhaps 8-16 lane?). This is why database de-dup must be done with a SQL Server specific tool. I believe LiteSpeed might do this? The tool would use the VDI to access raw data.

Technically SQL Server already has facilities to avoid the need for dedup - as in file groups, but few people use file groups for this purpose so I do think a separate dedup mechanism is appropriate. I do suggest looking at LiteSpeed to see if they have this.

November 11, 2013 9:54 AM

jchang said:

Lonny: sorry for the delay. It has been a long time since I looked at HBA Max Xfer size. I am inclined to think that 1) the RAID stripe + disks, 2) the HBA MTS and 3) the App/OS IO must all march in step to a common strategy.

Lets say we have RAID 5 in 7+1 with a stripe size 64K. Then the full stripe is 448K. If we issued write IO at 448K aligned to the RAID stripe, then the RAID controller would not need to first read the data + parity. It could just write the 448K and compute the new parity. A read at 448K would also hit all disks in the RAID group.

I did notice that SQL Server is now heavy on 512K IO for table scans. Backups default to an IO size of 1MB, but is configurable to a max of 4MB.

November 11, 2013 10:03 AM

Leave a Comment


About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog


Privacy Statement