Database backup compression is incredibly useful and valuable. This became popular with then Imceda (later Quest and now Dell) LiteSpeed. SQL Server version 2008 added backup compression for Enterprise Edition only. The SQL Server EE native backup feature only allows a single compression algorithm, one that elects for CPU efficiency over the degree of compression achieved. In the long ago past, this strategy was essential. But today the benefits are irrelevant while the lower compression is becoming a significant liability. All the third party backup compression products offer multiple levels of compression via open source algorithms and it is time for SQL Server to follow.
As always, we shall start with the historical context. LiteSpeed was introduced sometime around 2000 when the server system would have had 4 Pentium II or III processors at around 400-500MHz. This was before there were multi-processors so 4 sockets means four cores. Today, the individual processor core (Sandy Bridge to Haswell generations) is about 20 times more powerful, 6X via frequency and another 3X on instructions per cycle efficiency. And there are be 8 cores per processor socket for Intel Xeon E5 (Sandy Bridge) and 12 in the E5 v2 (Ivy Bridge) processors. Even on dropping from 4 sockets in 2000 to 2 sockets today, the system has perhaps 80 times more compute power.
In addition, the compression code also benefit from Hyper-Threading more so than SQL Server, as there are no lock contention issues. Even the Pentium 4 first generation HT, the performance gain was 50% for LiteSpeed database backup compression.
Over the same period of time, IO bandwidth has improved as well, but not by the same degree. Fourteen years ago, 350MB/s from the IO system was good (basically 2 x 2Gbit/s FC ports). Today it is possible for an IO system to deliver 10-20GB/s, except that some brain fart IO strategy in recent versions of SQL Server effectively caps practical IO bandwidth to disk storage in the 4-5GB/s.
So the net change is that today there is far more CPU available to support compression relative to the practical IO bandwidth. To better use this, we would like to use one of the algorithm that can achieve high compression with more CPU per unit of raw data. This is valuable when the data has to be sent over a wide-area network with bandwidth limits or even a local network because the infrastructure team would not put in 10GbE for the mission critical database server and has no concept of parallel network connections.
High compression can also be important when local storage space limited and extra database backups are necessary for migration operations. Of course the occurrence of constrained storage space is a separate matter of total organizational stupidity. Hard disk capacity is cheap, even for the more reliable enterprise models ($100 per TB for 7.2K HDDs and $450/TB for 10K). Yet in a SAN that is billed as helping to reduce cost (via increased utilization), somehow storage becomes precious, doled out in driblets from the SAN admin only after a series of time wasting meetings, forms and justifications. OK I am side-tracking to my long standing feud with SAN vendors bent on crippling database performance. The bottom line is that we have CPU resources on the SQL Server system, and we can put it to good use.
Compressibility of Data
Normally I like to use the TPC-H database for performance investigations because the data generator is openly available (with source code) to facilitate independent verification and because there a good library of published reports for comparison. Of course the TPC-H database is populated with randomly generated data which has different compression characteristics than normal data (less).
This is because in a typical production database, there is prolific use of sequentially increasing 8-byte data types or otherwise having a limited range of values, i.e. 63 out of 64 bits are the same from one row to the next. Another cause is lack of normalization, meaning repeating values that are easily reduced with dictionary based compression. The exception of course is databases with extensive use of (non-sequential) unique identifiers, which have almost no compressibility.
That said below are some results with the TPC-H data set.
|Type|| ||DB size ||Compressed B/U ||7Z|
|Uncompressed tables|| ||14.17||5.66||3.14|
|Compressed tables|| ||9.55||5.54||3.29|
|Clustered Column stored|| ||4.55||3.50||2.34|
With no compression at the table and index level, the SQL Server backup can achieve a 2.5 compression ratio. 7z can achieve a 4.5 compression ratio on the uncompressed backup, albeit at far high CPU consumption. With compressed tables and indexes, the database size is reduced to 67% of the original size. The SQL Server backup can achieve further compression as the block size is larger allowing for more effective dictionary use. It is perhaps no accident that both the normal database and one with compressed tables and indexes backup to about the same size. Even the 7z compressions on the uncompressed backup files are about the same size.
The clustered column store has even efficiency both in the database storage and the backup. I am thinking that the storage organization makes compression more effective.
Note that the SQL Server compressed database backup is not further reduced with 7z or other common file compression utility.
This is a simpleton’s solution. It is important to differentiate between the objective and the method. Backup compression consumes CPU. When LiteSpeed first came on the market, people asked for a means to throttle CPU. The reason was to maintain some degree of responsiveness to user transactions during the backup. Full database backups are normally done at night or during off-hours if the two are different. But there might be some user activity during the backup. So it might seem reasonable to throttle the CPU used by backup compression.
However, this is why I stress the need to differentiate between the objective and method. State the true objective as accurately as possible so that intelligent people can come up with the best solution rather than constraining the developer to artificial requirements. The reason a system is unresponsive is not because the CPU is fully utilized, but because of the wait for a CPU to become available.
In the Windows operating system from the very beginning or early days, there is the quantum. On multi-processor servers, there is a time interval of 15.625 milliseconds (1/64 seconds) and the quantum is 4 times this. A processor core can compress a 1MB block (the standard backup transfer size) in well under the full quantum. Simply yield the thread after each block. If there are any user transactions, there will be minimum wait time. And any spare CPU cycles can still be used for compression with only moderate loss of efficiency versus running for the full quantum.
Of course, a good strategy might be to impose some kind of IO bandwidth control. The key element is to key disk latency low, meaning the IO queue depth should be low. It should be possible to control this with the buffer count setting. If there are 1 or 2 buffers per thread, then the IO queue should not be over saturated with IO generated by the backup process.