THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Kevin Kline

Data Deduplication Technology - New Article on DBTA

I've been seeing more and more questions from customers about data deduplication technologies. I guess that's because I can't open a blog, website, or email related to technology without some vendor or another pushing the wonders of data deduplication.  Consequently, I thought I'd take a few minutes to describe what the technology is all about.  I'll sum up my thoughts on the the value of deduplication technology by saying "It depends".  Read my article entry at Database Trends & Applications magazine here.

If someone in your company is pushing you to examine this very expensive technology, please give my article a gander and let me know what you think.  I also encourage you to take a look at Brent Ozar's excellent post on the topic located here.  Remember that when in the right hands, as with all technologies, they can be used to solve tricky business problems.  But in the wrong hands, they may be a square peg pounded into a round hole ... or even worse.

Feedback always welcome!


 Twitter @kekline

 More content at

Published Monday, December 14, 2009 3:17 PM by KKline

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Ashley said:

Data deduplication is a specific form of compression where redundant data is eliminated, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. Deduplication is able to reduce the required storage capacity since only the unique data is stored. For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment..

There are various methods in data deduplication, they are:

Chunking methods

Post-process deduplication

In-line deduplication

The advantages are:

Data deduplication is a very valuable tool within the virtual environment as well, giving you the ability to deduplicate the VMDK files need for deployment of virtual environments.

It contributes significantly in the process of Data Center Transformation through reducing carbon footprints due to savings on storage space.

It reduces the recurring cost of human resource to management and administration.

It reduces the recycling of the hardware.

The disadvantages are:

Data deduplication solutions rely on cryptographic hash functions for identification of duplicate segments of data. A collision would result in data loss.Another major drawback of data deduplication is the intensive computation power required. For every file or chunk of data, all the bytes are used to compute a hash value. The hash then needs to be looked up to see if it matches existing hashes.

January 21, 2010 7:32 AM

Leave a Comment


About KKline

Kevin Kline is a well-known database industry expert, author, and speaker. Kevin is a long-time Microsoft MVP and was one of the founders of PASS,

This Blog



Privacy Statement