Many companies or organizations do regular data cleansing. When you cleanse the data, the data quality goes up to some higher level. The data quality level is determined by the amount of work invested in the cleansing. As time passes, the data quality deteriorates, and you need to repeat the cleansing process. If you spend an equal amount of effort as you did with the previous cleansing, you can expect the same level of data quality as you had after the previous cleansing. And then the data quality deteriorates over time again, and the cleansing process starts over and over again.
The idea of Data Quality Services is to mitigate the cleansing process. While the amount of time you need to spend on cleansing decreases, you will achieve higher and higher levels of data quality. While cleansing, you learn what types of errors to expect, discover error patterns, find domains of correct values, etc. You don’t throw away this knowledge. You store it and use it to find and correct the same issues automatically during your next cleansing process. The following figure shows this graphically.
The idea of master data management, which you can perform with Master Data Services (MDS), is to prevent data quality from deteriorating. Once you reach a particular quality level, the MDS application—together with the defined policies, people, and master data management processes—allow you to maintain this level permanently. This idea is shown in the following picture.
OK, now you know what DQS and MDS are about. You can imagine the importance on maintaining the data quality. Here are some resources that help you preparing and executing the data quality (DQ) and master data management (MDM) activities.
- Dejan Sarka and Davide Mauri: Data Quality and Master Data Management with Microsoft SQL Server 2008 R2 – a general introduction to MDM, MDS, and data profiling. Matching explained in depth.
- Dejan Sarka, Matija Lah and Grega Jerkič: MCTS Self-Paced Training Kit (Exam 70-463): Building Data Warehouses with Microsoft SQL Server 2012 – I wrote quite a few chapters about DQ and MDM, and introduced also SQL Server 2012 DQS.
- Thomas Redman: Data Quality: The Field Guide – you should start with this book. Thomas Redman is the father of DQ and MDM.
- Tyler Graham: Microsoft SQL Server 2012 Master Data Services – MDS in depth from a product team mate.
- Arkady Maydanchik: Data Quality Assessment – data profiling in depth.
- Tamraparni Dasu, Theodore Johnson: Exploratory Data Mining and Data Cleaning – advanced data profiling with data mining.
- Forthcoming presentations
- I am presenting a DQS and MDM seminar at PASS SQL Rally Amsterdam 2013:
- Wednesday, November 6th, 2013: Enterprise Information Management with SQL Server 2012 – a good kick start to your first DQ and / or MDM project.
- Data Quality and Master Data Management with SQL Server 2012 – I wrote a 2-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses firstname.lastname@example.org or email@example.com. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well.
Start improving the quality of your data now!