THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andy Leonard

Andy Leonard is CSO of Linchpin People and SQLPeople, an SSIS Trainer, Consultant, and developer; a Business Intelligence Markup Language (Biml) developer; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server 2012 Integration Services Design Patterns. His background includes web application architecture and development, VB, and ASP. Andy loves the SQL Server Community!
Note: Comments are moderated. Spam shall not pass! </GandalfVoice>

Master Data Management 101: Standardize Units of Measure

Introduction

 

   There's quite a bit of hoopla about MDM lately, mostly due to awareness. While "Lacks needed data" was cited as the number 1 problem (with 21 votes) in data warehouses in the recent IBM Data Warehousing Satisfaction Survey (2007), "Insufficient or inadequate master data" made a decent showing at second with 17 votes (see Table 3).

 

   The new Microsoft MDM site is a good place to start learning about Master Data Management and Microsoft’s efforts in this area.

   Master Data is a pretty common thing for engineers. I learned about it way back in my manufacturing engineering days.

 

The Issue

 

   Consider this scenario: Conglomerate C (CC) makes widgets and starts acquiring businesses that also make widgets. CC sells widgets by the pound, but Acquisition A (AA) measures them by counting individual widgets, while Acquisition B (AB) sells them by the case (gross, or 144 ea).

 

   CC now wants all this data in a data warehouse so they can compare apples to apples and know, among other things, how many widgets they’re actually making and selling on a given day.

 

Note: Instrumentation and measurement are scientific disciplines in their own rite. There's a lot more to this, which I hope to cover here and in my new Applied MDM blog.

 

   The Unit of Measure in the existing database, dbCC, is pounds. The Widgets tables from the three companies look like this:

 

dbCC.dbo.Widgets

ID

Date

Weight

1

1/1/2007

2076

2

1/2/2007

2100

3

1/3/2007

1977

 

dbAA.Product.Widgets

ProductID

Date

Count

F0932E13-218D-458A-BE09-3286AFDE0280

1 Jan 2007

10,265

F68BF7AC-553E-4A32-B1CB-442DD310194C

2 Jan 2007

13,009

8C0C7511-1386-4C13-84B8-2351248280E6

3 Jan 2007

17,121

 

dbAB.dbo.Widgets

ID

Date

Cases

1

20070101

84

2

20070102

82

3

20070103

99

 

 

Standards, Standards, Standards

 

   One facet of MDM is standardizing this data. The keys to standardizing this data are recognizing traits in the data types, along with the quality of any proposed conversion.

   For example, the Cases to Count ratio is most likely stable and predictable. Conversion is easily accomplished using multiplication (or division, depending on which way you go in the standardization). Quality will be high.

   But the weight to count (individual or case count) conversion is going to be impacted by other factors. Most notably, do all widgets weigh the same? If not, what’s the acceptable tolerance? Quality will be adversely affected by tolerance.

 

   Dimensional analysis (the multiplication or division you do to convert known quantities) is also a question about measurement granularity. Trust me: You will want to store as fine a grain as possible.

    Looking at the sample data, you will want to store WidgetCount somewhere. dbAA is already in this format. Yay. dbAB is easy enough: dbAB.dbo.Widgets.Cases * 144 gives you WidgetCount. Again, the math on widget Weight in the dbCC data becomes fuzzy. But our customer assures us each pound contains five widgets. Five widgets per pound is probably a high quality number. It's much higher than, say, five hundred or five thousand per pound!

 

   "Fuzziness" will impact the integrity of your data. A couple important measures of data warehouse integrity are data accuracy and signal to noise (usually defined by the percentage of “unknowns” in the data).

   When I have encountered this scenario in the field, I have informed the customer of the dangers and begged them to collect better metrics at the WidgetWeight station for company CC.

 

Enter Stage Left...

 

   There are other issues in these examples: date and ID standardization. Dates are fairly straightforward. The IDs can be a little tricky. To standardize the IDs in this example I may consider a LocationID and ProductDateID (ID) composite key on the first pass.

 

   I might create a couple tables in the data warehouse staging database that look like this:

 

Staging.Products.Widget

LocationID

ID

Date

Count

1

1

1/1/2007

10380

1

2

1/2/2007

10500

1

3

1/3/2007

9885

2

1

1/1/2007

10,265

2

2

1/2/2007

13,009

2

3

1/3/2007

17,121

3

1

1/1/2007

12,096

3

2

1/2/2007

11,808

3

3

1/3/2007

14,256

Staging.Products.Location

LocationID

LocationDescription

1

dbCC

2

dbAA

3

dbAB

 

   I’ve assumed (based on customer feedback) I get 5 widgets / pound from dbCC, and I know the math for the rest. Here I've prepared Location data for a star schema via a LocationID and Location table. But I haven't touched dates and IDs, they will need some work before reaching the destination data warehouse.

 

Just the Fact, Ma'am

 

   When I load the destination data warehouse, LocationID maps nicely to the Location dimension surrogate key, while I could simply make the column now labeled ID into a Date dimension surrogate key (removing the Date column from the fact table altogether because it's now redundant because it's now redundant).

   This three-column table then makes a decent Fact:

 

DW.Product.FactWidget
LocationID DateID Count
1 1 10,380
1 2 10,500
1 3 9,885
2 1 10,265
2 2 13,009
2 3 17,121
3 1 12,096
3 2 11,808
3 3 14,256

 

   There’s more to Master Data Management, but this is one type of business problem folks are trying to solve when they talk about MDM and how it ultimately relates to data warehouse quality and customer satisfaction.

 

:{> Andy

 

Technorati Tags:

Published Sunday, October 21, 2007 12:24 AM by andyleonard

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Paul Nielsen said:

Excellent post and great resources. I learned about Master Data and Unit Conversions while building inventory systems and MRPII. It's amazing - a good data modeler learns more about how a busines runs than many professional managers.

October 21, 2007 8:25 PM
 

EdVassie said:

Good article...  I remember working in a civil engineering company where sheet steel was designed by area (square yard or meter, depending on the design office), ordered by the ton (imperial, US, metric, depending on the vendor), made by the batch, and issued by the plate.  Working out how to reconcile this kept a few people busy...  And the constructor had to record where bits of each sheet were used in case a batch was later found to be sub-standard and failure scenario analysis showed critical parts required replacement.

November 5, 2007 8:38 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

My Company


Other Blog

Check out my personal blog...
http://andyleonard.me

Contact Me

Twitter: @AndyLeonard
Email: andy.leonard@gmail.com

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement