A year or so ago, I watched a few episodes of a Dutch television program that had an interesting format. The name of the series was (or is, I have no idea if it still runs) “Sterren op het doek” (“Stars on Canvas”). Every episode featured a Dutch celebrity, three painters, and an interviewer. For the program, the three painters each paint a picture of the celebrity (who is interviewed while posing – and because this takes a long time, the interviews were typically quite deep, and had enough material for an interesting first 15 minutes of the program). After that, the painters were given two weeks to finish their paintings. And at the end of the program, the painters each revealed their painting to the celebrity, who then got to pick one to keep for him- or herself; the remaining two portraits were auctioned off and the proceeds were given to a charity.
What made this program intriguing was the enormous differences between the finished paintings. Not only did each painter have his or her own unique style, they also all chose to capture and emphasize different aspects of the physical appearance and different personality traits of the celebrity. They all made portraits that captured the celebrity really well, and yet they were as different as one could imagine. Even though the painters all received the same “input” (watching the celebrity pose and listening to the interview), they produced radically different “output” (paintings), while still all following the same task. It was also interesting to discuss the results with my wife. We often disagreed on which painting we liked best, and the celebrity would then sometimes pick the one we both disliked! Again, three people who, with the same input, generated vastly different output. Because when it comes to art, picking the “best” is a matter of taste.
But what’s good in art, and in this television format, is not necessarily good elsewhere. When I fly somewhere, I don’t want the result of the pre-flight security check to depend on the engineer assigned to that task. For a given “input” (the current technical state of the plane), I want the “output” (‘clear to go’ or ‘needs repairs’) to be the same, regardless of the engineer who inspects the plane. And luckily, the airline companies are aware of this, so their safety inspection engineers use checklists that list exactly what details of the engine have to be inspected, and what value ranges are or are not acceptable.
Data models, like paintings, often end up as decoration on office walls. Put they serve a very different purpose. For a painting, hanging on the wall is the end goal, pleasing the eye of all beholders. For a data model, being stuck to the wall is only a further step in a longer process to the end product: a working database application that serves some business need. If a painting fails to please the beholder, it’s simply a difference in taste. If a data model fails to serve the business needs, a mistake has been made – and mistakes like this can cost companies large sums of money, or, in some cases, even cost lives.
I want data models to be like the safety inspection on planes: dependent on the input only, not on the person who carries out the task. That brings me to the third principle of modeling, after the Jargon Principle and the Concreteness Principle: The Reproducibility Principle.
The Reproducibility Principle
“The analyst shall perform each of his or her tasks in such a way that repeating the same task with the same input shall invariably yield the same result.”
Note that the wording of the Reproducibility Principle (which, again, is put in my own words because I don’t recall the original wording, only the meaning) does not state who repeats the task, and that is on purpose. This principle should still hold if someone else repeats the same task; the result of an information analysis should not depend on who does the analysis.
Easier said than done?
While writing this, I almost heard the angry outcry from all over the world: “Yeah, right! I can work in a way that almost always ensures consistent results, but some of my co-workers are blithering idiots who simply don’t get it – how can you seriously expect them to produce the same quality analysis that I make?” That’s a fair point, and maybe a weak spot in the wording of the Reproducibility Principle – but not a weak spot in the principle itself! Hang on, I’m sure it will become more clear after a few paragraphs.
But even with coworkers who are level with your experience and knowledge, you’ll still often have different opinions about a data model. I experienced this once, many years ago, when I attended a class on data modeling. The students had to draw a data model for a given scenario, and then present it in class. To my surprise, several models that were very different were all praised by the teacher. That just didn’t make sense to me. How can two models that don’t represent the same information both be correct? Clearly, the teacher of that class did not know or care about the reproducibility principle! If that is how information modeling is taught, it’s impossible to expect the modelers to suddenly all start producing the same results on a given input.
Does this mean that the Reproducibility Principle is good in theory but impossible to achieve in practice for information and process analysis? No! It just means that the currently accepted methods of doing information and process analysis are not equipped for this principle. They lack recipes. Not recipes for pie, as you’d find in a cookbook, but recipes for the steps in making a model – but these modeling recipes should be just like the cookbook recipes, in that they tell you exactly what ingredients you need, and how and in what order you process and combine these ingredients to get the required result: a delicious apple pie, or a completely correct data or process model.
The only way to implement the Reproducibility Principle is to have strict recipes that govern each and every conclusion that an analyst even infers from information given to him or her by the subject matter expert. With recipes, I know for sure that if I give the same information to my co-worker, the result will be either the same – or it will be different, and then we’ll sit together, go over the steps in the recipe until we see where one of us made a mistake.
And those blithering idiots who just don’t get it? Having recipes will not suddenly change them to superstar modelers. They’ll still be idiots, they’ll still not get it. But with recipes, we no longer have to confront their bad insights with our good insights, and still disagree. Now, we can simply point out exactly where they failed to apply the recipe in the correct way, and even our bosses will agree. The blithering idiot coworkers will still be a royal pain, but fighting them will be easier if your company agrees on implementing the Reproducibility Principle by supplying recipes for everything you have to do in your role as information or process analyst.
Putting my money where my mouth is
Look at me, all blathering in abstract terms about how information modeling does not follow the Reproducibility Principle – and all this time, I myself am not applying the Concreteness Principle that I advertised in an earlier blog post. Time to change that!
First, I’d like to start on the positive side. The job of creating a data model is not all up to the experience and insight of the analyst; some of the steps involved actually have very good recipes for them. Normalizing a data model is a fine example of this. For a given set of attributes (columns) and functional dependencies, the steps are clearly described. Simply apply them one by one, and the end result is always a nice, normalized data model. Unless you made a mistake somewhere (which does happen; I never said that the steps are easy, I just say that they are well prescribed!), your teacher (if still on course) or a coworker can tell you exactly where you made a mistake.
But where do that list of attributes and their functional dependencies come from? Now we get into the realm of the bad and the ugly – none of the mainstream analysis methods have strict recipes for this. If I ask how to find the attributes for my model, the answer is always a variation of “talk to the subject matter expert”. Yeah, sure, I get that. But what questions do I ask? How do I ensure that no important attributes are missed in our conversation? And how do I separate the wheat from the chaff, so that I don’t waste time on attributes that are not relevant at all? The only answer I have ever got to that question is “experience”. And that is not an answer that I, as a fan of the Reproducibility Principle, like.
For functional dependencies, there is a similar problem. Given a list of columns, how do I find the dependencies? Again, there is no answer that satisfies me. Most people will tell me that these dependencies are “obvious”, that you “just see” them. And indeed, in 99.9% of all cases, people will agree on the dependencies. It’s not those 99.9% that I’m concerned about; my concern is for the remaining 0.1% – those few cases where the functional dependencies are not obvious is where errors are introduced. Errors that sometimes are caught quickly, costing only a few man months of work. Or sometimes, they are not caught until it’s too late, bringing down entire companies –or, worse, killing people!– as a result.
Bottom line
After reading the above, you’ll understand that I only consider a modeling method to be good if it includes strict “recipe” procedures. And thinking back about my previous posts on the principles of modeling, you’ll also understand that I think those procedures should tell you not only what questions to ask the domain expert, but also tell you to use concrete examples in the jargon of the domain expert when asking them – and tell you how to do just that.
Unfortunately, most design methods fail to embrace these principles. The only method I ever encountered that does fully embrace all these principles is NIAM – a method that, as far as I know, has only been documented in a Dutch book that has never been translated. (There are some more or less closely related methods that are documented better, but they all either lack full support for the Modeling Principles, or focus on the wrong details). Uptake of this method in the Netherlands is minimal, and virtually non-existent in the rest of the world.
However, that has never stopped me from using it, extending it, and improving it. With all the changes I made the method I am now using can be said to have evolved from NIAM to a personal (though obviously still NIAM-derived) method. And I am proud to announce that this personal method will be the subject of a seminar that I will be giving on March 29 in London, on the first day of SQLBits X – a conference that anyone who can should attend. Thanks, SQLBits, for giving me a platform where I can share my experience with this modeling method with others.