THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

Data modeling: art or science?

This blog has moved! You can find this content at the following new location:

Published Sunday, August 3, 2008 1:04 AM by Hugo Kornelis

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Dave Jermy said:

I'm not sure that it is possible to make data modelling entirely scientific but agree that steps could and should be taken to introduce more rigour into it.

And, yes, users should only ever sign-off the requirements and the answers they give to the data modellers.

August 4, 2008 9:09 AM

Scott R. said:


Great coverage of this topic!

One item you didn’t mention is that subsequent lifecycle changes to the database design need to be applied as appropriate to both the logical and physical data models.  If not, a later derivation of a physical data model for a different DBMS from the logical data model may be out of sync with the current physical data model implementation, since the logical data model is out of sync with the current implementation physical data model.  I have often seen the best of intentions in this area – starting with a logical data model during initial application design and deriving an initial physical data model – only to have all later application lifecycle database design changes implemented only in the physical data model.

As you mentioned, not all aspects of the physical data model need to be reflected in the logical data model (such as indexes and partitioning, which are physical-only issues), but other database design changes need to be reflected in both data models.

Keeping separate models in sync has historically been a challenging and effort- and time-consuming application lifecycle maintenance goal.  Some tools can help with this effort.  Hopefully, tools will get better at this over time.  But it still comes down to the discipline of the data / database architects to value both models and make it happen.

Scott R.

August 4, 2008 11:38 AM

Hugo Kornelis said:

Scott R. wrote: "One item you didn’t mention is that subsequent lifecycle changes to the database design need to be applied as appropriate to both the logical and physical data models"

That's a VERY good point - and one where many tool vendors have completely dropped the ball.

I'm sure that everyone who has ever dabbled in data modeling will have used tools that are able to generate the DDL for a database from the finished model. This is great - for a first release. But in practice, the first release is never the last, and what good is updating the model in the tool if all the tool can generate is a bunch of CREATE TABLE statements? The tables already exist from the first release, and are filled with millions of rows of customer data right now; the LAST thing one wants to do is to drop and recreate them!

This is one of the main reasons why so many tools are bought, installed, used for a first release ... and then end up being abandoned and forgotten. Keeping the model current becomes an extra chore if the changes can't be automatically propagated to the deployed database; it ends up as all documentation: "We'll update it later; right now we have a deadline to catch but after that, we will reserve time to change the documentation. Honest!"

August 6, 2008 5:26 AM

geert deprez said:

nice article

makes me think about discussions long time ago, is programming art of science

style matters, elegance matters, you like it or you don't, so it's an art

but with one dificultie, in the end it has to work, like science is

the difference might be the fact that inspiration is less important, the idea about what is needed does not come from the programmer, and, as far as I am concerned, that's what is making art art, the idea

August 10, 2008 2:23 PM

TroyK said:


Some interesting points you raise.

Your experience with the class where three different models were presented that all appeared to meet the requirements is exactly in line with the research that Simsion has done on data modeling (see "Data Modeling Essentials" by Simsion and Witt).

In short, given a particular conceptual model, there are 1 or more logical models that will satisfy the requirements specified therein. The choices that go in to making that logical design involve myriad and often conflicting concerns. This is where the "art" happens. Optimistically, though, if we were able to encode these design concerns and parameterize them, it would remove much of the subjectivity in the conceptual-to-logical mapping.

Your paint analogy is nice in that it points out exactly where we sometimes go wrong as data professionals. The subject matter experts should _not_ be asked to review logical models, but rather the conceptual model that informs them. The conceptual model should completely represent the business rules at play in the domain of interest, and there can really be only one "correct" conceptual model (for any differences in candidate conceptual models directly map to differences in understanding of the problem domain, and must be resolved before automation).


August 11, 2008 1:29 PM

Arjen said:


You say, about a logical data model. "The logical data model should focus on correctness and completeness; it should be completely implementation agnostic."

First: What does it mean?

(I've looked in the wikipedia what "Logical correctness" means, but somehow I'm not able to apply it to logical Data modeling.)

Second: How can I achieve this?

Mabye you know some books that can explain this in simple terms.

August 12, 2008 4:37 AM

TroyK said:

Hi Arjen;

I'm sure Hugo can expand on this a bit, but the short answer to your question is that a logical model should be absent any constructs that are specific to the RDBMS of your choice. In other words, a logical model need not be changed if you implement it on SQL Server or Oracle (or, perhaps more realistically, SQL Server 2000 or SQL Server 2008).

When you move to the physical implementation, you consider things like DBMS-specific datatypes (HierarchyID, the new Date and Time datatypes), performance-based tradeoffs, etc.

It's important to remember that for any single conceptual model (which must be "correct", at least with respect to the _current_ understanding of the problem domain), there are multiple possible "correct" logical models, even if we restrict them to relational.


August 13, 2008 12:56 PM

Arjen said:

Hi TroyK,

I understand that the logical data model differs from the physical data model. It's on a higher abstraction  level and leaves out any information about physical  storage.

But correctnes and completnes are terms which ar also used in the scientific field of 'Logic'. The have a certain meaning  there, an  I was wondering if they are the same terms here.

We try to focus on all the entities in the domain and check of it is all there, if so we consider it complete . We also check of it looks satisfaible and if we are okee with it we consider it  correct. If this is general the meaning of completeness and correctness no further questions ask, but maybe Hugo is refering to something  mmore subtle. Something you always should do to  improve the quallity of the  model.

It feels a bit like you say "you should always do normalization" and I have never heard of "normalization" in the context in database design. (Yes, a shame on me if it were true.)

It's all about the phrase "you should focus on  ...". It says "here comes something really important" and then it's followed by these two terms  "correctness" and "completness". No idea what he means by these terms or if they are used in some theorethical area. If you think that it's important to mention it, it would be nice to give some explanaition. A simple link will do. Just like de link under ERM.

Personally, I think if you go a bit acadamic you narrow your group of readers, that already knows what it's about. If that's the purpose it's oke, but if the purpose is to make designers more aware that there is more  out there to improve the quallity of the design in general it won't hurt to  add a bit more explaining.

August 14, 2008 9:22 AM

Denis Gobo said:

Wow, it has been already a year since I wrote A year in review, The 21 + 1 best blog posts on SQLBlog

December 31, 2008 10:37 AM

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server' said:

In one of my previous posts , I discussed whether data modeling is art or science, and I concluded that,

October 20, 2011 5:39 AM

Rick Nunn said:

Interesting article though the details don't do the subject justiice. In many instances, the thing we call science has a bit of art mixed in. Changing order of adding elements during a chemistry process can alter the outcome. Virtually every "science" experiment will have varying results when done by different individuals.  The real trick is in understanding the variables, standardizing the variables, and explaining or removing the variables.

Likewise, data modeling is a "disapline" requiring the same attention to detail.  In your example, the shopowner should and usually does the shop sample is a known enity.  The components that result in a particular color have been measured over and over.  The most important part of this process is getting the customer to confirm the store sample and his sample are an exact match.

Early in my career, I discovered a universal truth.  That truth being a person will always tell you what the want, but seldom tell you what they need.  If you give 'em what they want, they will always be asking for more.  If you give 'em what they need, they will be happy. Therefor it is the data modeler's responsibility to communicate with the business expert to separate the wants from the needs.  During the all important model review, it is up to the data modeler to present the model in a manner that doesn't overwhelm the business representative.  The data modeler does not need to impress the business with the comples representation of all his/her hard work.  That would be counter-productive and like showing the shop customer the component index numbers; unneccessary static.

I have interviewed many candidates for data modeling positions.  An alarming number have little to no clue of what it takes to successfully design, model and present a database.

Is data modeling an art or science?  Currently, it is a disapline.  Someday, I believe it will evolve into a science, but for now it is a bit of both.  Remember, a little more than a hundred years ago engineering was not a science; a lot of trail and error - kind of like data modeling.

November 9, 2011 9:32 AM

Leave a Comment


About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.
Privacy Statement