THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul Nielsen

www.SQLServerBible.com

Why the Relational Model is Insufficient (part 1)

I’ve been designing relational databases since the mid 80’s. My conclusion is that the relational model is weak and lacking. Specifically, the foreign key – that simple DRI constraint which is the keystone of the relational model - is insufficient. The model poorly represents reality.

First, The foreign key can ensure that a FK references an existing PK, but is unable to ensure the PK is valid as the recipient of the FK because the FK can’t define valid workflow state of the PK. For example, a paycheck FK can reference the PK of a terminated employee, or an order detail FK can sell a product PK even if the product is unavailable for sale. The FK only checks for the presence of the PK in the primary table.

A stronger model would include the workflow state of the PK in the definition of the FK.

As a workaround, developers have three options. Code the check in a trigger (extending transaction duration), code the check in a sproc (best option) or logical data access layer, copy valid PKs into a “bucket table” and point the FK to the bucket table (cheesy). (The Nordic object/relational database for SQL Server includes optional workflow state in its association definition.)

The second limitation of the relational model is that the basic DRI constraint is a one-way check. Let me explain, in reality there are two types of PK-FK relationships – a has-a relationship, and a contains relationship.

In a has-a relationship, one item is loosely connected to another item. The presence or lack of presence of the connected item doesn’t change the existence of the primary item. Either way the primary item is a valid item. For instance, an employee is a valid employee even if it doesn’t have any connected paychecks.  The relational model handles this example well.

The second type of relationship is a contains relation. In this case, the primary item is constructed of the secondary item, and without the presence of the secondary item, the primary item itself is invalid. A very common example is an order-order detail relationship. We’ve used this as the classic primary table/secondary table example for a few decades, but think about it again. An order without any order detail rows is an invalid order. The relational model teases this need with cascading deletes, but that only addresses some of the problem at the trail end. To truly address contains relationships there needs to be a check to ensure that the PK has at least one valid FK by the end of the logical transaction.

The relational model was a breakthrough in the mid 60’s, but unless the model can continue to develop, it will eventually be overrun by OOA/D.

Published Monday, December 18, 2006 6:42 AM by Paul Nielsen

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

AaronBertrand said:

I think it would be very tough for the relational model to account for all possible business workflows (e.g. depending on the turnover rates in my company, I may store terminated employees in a different table).  I also think it is difficult for a generic database product to predict how an application or its designers will want to constitute the transaction of "an order."

I agree there are some shortcomings to the relational model, but I think many of us would solve them in different ways.  So expecting the model to solve them for all of us is going to be a very tall order.
December 18, 2006 8:12 AM
 

Alex Kuznetsov said:

Hi Paul,

I think that when an employee quits his job, existing paychecks from the past can still refer to that employee. I think that neither triggers nor stored procedures can guarantee 100% integrity - only RI can. I used RI to enforce this requirement, details here:

http://sqlserver-tips.blogspot.com/2006/10/child-intervals-of-time-inside-parent.html

Suppose I want to enforce another rule, "Active tickets cannot be assigned to programmers who quit". I can use RI to enforce it:

CREATE TABLE Programmers(FullName VARCHAR(30) PRIMARY KEY,
 Status CHAR(10) NOT NULL CHECK(Status IN('Active', 'Quit')),
 CONSTRAINT Programmers_UNQ UNIQUE(FullName, Status)
)
go
CREATE TABLE Tickets(TicketName VARCHAR(30),
 Status CHAR(10) CHECK(Status IN('Active', 'Closed')),
 ProgrammerName VARCHAR(30) NOT NULL,
 ProgrammerStatus CHAR(10) NOT NULL,
 CONSTRAINT Tickets_FK FOREIGN KEY(ProgrammerName, ProgrammerStatus) REFERENCES Programmers(FullName, Status) ON UPDATE CASCADE,
 CONSTRAINT ActiveTickets_CK CHECK((ProgrammerStatus = Status) OR Status = 'Closed')
)
go
INSERT Programmers(FullName, Status) VALUES('Jane Stork', 'Active')
INSERT Tickets(TicketName, Status, ProgrammerName, ProgrammerStatus) VALUES('Cannot open report', 'Active','Jane Stork',  'Active')
INSERT Tickets(TicketName, Status, ProgrammerName, ProgrammerStatus) VALUES('Cannot save changes', 'Closed','Jane Stork',  'Active')
go
-- must fail:
UPDATE Programmers SET Status = 'Quit'
/*
Server: Msg 547, Level 16, State 1, Line 1
UPDATE statement conflicted with TABLE CHECK constraint 'ActiveTickets_CK'. The conflict occurred in database 'Sandbox', table 'Tickets'.
The statement has been terminated.
*/
--Close the active ticket first
UPDATE Tickets SET Status = 'Closed'
--
-- now it succeeds
UPDATE Programmers SET Status = 'Quit'

-- Programmer Status is 'Quit' now
SELECT * FROM Tickets

What do you think?
December 18, 2006 9:44 AM
 

Adam Machanic said:

Paul,

First of all, I agree with Alex.  It seems to me that if the data rules (i.e., keys and constraints) defined in the database are allowing invalid transactions to occur, it's not the fault of the way foreign keys are implemented.  Rather, it indicates that the database designer has not done his or her job properly.  You probably wouldn't blame C++ if a developer introduced a logic bug in an application-level method; why blame the database because the designer didn't think through all of the issues properly?  The database is not psychic, even though end-users sometimes seem to think so.

Second, I suspect that you're not actually referring to the Relational Model at all.  Keep in mind that SQL DBMSs are *not* relational, and are merely a bastardization of the true Relational Model. Perhaps you should give Dataphor a try and then post back on your findings after working with something a lot closer to Codd's real intentions.  And I think I've mentioned "The Third Manifesto" to you before -- in it, Date and Darwen show that a properly implemented OO database is nothing more than a properly implemented relational database. So will the relational model fade away in the coming years? Not likely -- rather, we might actually see some mainstream products that implement it!
December 18, 2006 2:50 PM
 

Paul Nielsen said:

Alex,
So I assume that you too have needed to include the workflow state of the primary row in your FK constraint. Your solution is similar to my trigger solution, that is to check the PK workflow state value during the secondary row insert transaction. You accomplished this by duplicating the PK status using a FK with cascade update and a check constraint - a workaround that I like. The check constraint is probably faster than a trigger. The only advantage I can see to the trigger is that is can be a bit more dynamic with data driven complex workflows, while the check constraint must be hardcoded (unless it uses a UDF). But very nice solution Alex, Thanks!
-Paul

Adam,
It's not a question of if the designer thought of everything or implemented it well, of course a good designer will develop a solution usign the tools at hand.

it's a question of the inherient capabilities of the DRI constraint vs. a need to program to model reality. In the comparision to C++, I'm saying that one of the commands is incomplete. Your argument could be made to say we don't need DRI FK constraints because it could be programmed. The fact remains that the FK constraint itself doesn't handle all the types of relationships found in reality.

And yes, I do think the relational model is at risk of fading away. More and more data-centric apps are being designed by non-db pros who live in a world of OOA/D and view the database as a means to persist an object and see no value to PK, FK, or relational design. Take a hard look at many of the more popular enterprise apps; all the data design and data integrity checks are done in the data access tier, not the db. and MSFT Link / ADO 3.0 is a means to simply store and retreive an object.

And true, admittedly I'm most concerned with the actual physical implementation, specifically SQL Server.  

-Paul
December 18, 2006 4:44 PM
 

Louis Davidson said:

<rant>

>>The relational model was a breakthrough in the mid 60’s, but unless the model can continue to develop, it will eventually be overrun by OOA/D. <<

I guess part of the question (and what is so amazing to me) here to me is the difference between the relational model and what is currently being implemented.  Relationally, we can easily define a 1-1 required relationship, but it currently cannot be enforced in the RDBMS that I/we are accustomed to (Microsoft SQL Server).  But it this the relational model's fault?  No, the problem is that the implementation of this problem's solution is currently beyond what is being implemented.  The solution seems easy on the surface, just delay the checking of the check constraints until the end of the transaction, right?  Of course it sounds easy (stack of constraint checks?), but it would be really complex to implement in the engine.  

What fascinates me about the whole of relational design is how much more relevant it should be becoming now as computers get more and more powerful.  We used to have to "denormalize for performance" almost by default.  Now, not so much.  And we can all agree (here at least) that a well designed, property implemented using the RDBMS database is so much easier to work with.  But the rub is in the words "designed" and "properly."

>>all the data design and data integrity checks are done in the data access tier, not the db. <<

Why is this?  I can't get a decent grasp on why this is at all.   And considering just how many implementations fail...  It just makes me mad.  I have theories, of course.  My theories always center around the fact that so many bean counter project managers promise things before understanding the problem that the design phase of projects are being largely ignored.  So this leads to:

* cries that "the database is too slow!"  - No, your design does not match the paradigm of the software you are trying to implement with.  

* data integrity issues - in the database, implementing a constraint is in one place, and happens automatically.  In the data access layer, forget a check when you are writing a function and data goes unchecked.  

>>More and more data-centric apps are being designed by non-db pros who live in a world of OOA/D and view the database as a means to persist an object and see no value to PK, FK, or relational design.<<

Which in turn leads to horrible difficult to query databases with little or no integrity because the effort to effectively protect all data in the data access layer is freaking hard to maintain.  Not to mention that people go behind the app and load data for conversions, bulk loads, etc, etc.   And when the application is built with only surrogate keys for relationships, no logically unique values, well, most anyone that reads your or my or any of the people who have replied here's blogs can predict the outcome of that!

What bothers me the most about all of this is just how simple database design is, and how by defining integrity into the database using 90+% constraints (and a smattering of triggers) you can protect from 80% of the data integrity issues with relative ease.  The other 20% are complex, time based rules that can change frequently that are well suited for non-database implementation.  

Add in stored procedures to format the data into a more pleasing interface for the user interface, and this is a win-win...

</rant>
December 18, 2006 11:06 PM
 

Louis Davidson said:

Starts out like this:

&quot;I’ve been designing relational databases since the mid 80’s. My conclusion...
December 18, 2006 11:15 PM
 

Greg Linwood said:

Paul, I think you've made two excellent points & I share your hope that the relational model be improved so that such rules can be implemented both within the model & to a standard. I doubt that all possible rule types can be implemented (at least not in the near future), but your second point shouldn't be terribly difficult to implement.

Do you think that non-relational DBMS models will live up to modern business performance expectations?
December 19, 2006 4:53 AM
 

Davide Mauri said:

Hi Paul

Though the relational model really has some limitations, those limitations are not what you're now describing. You're confusing the relational model with ITS IMPLEMENTATIONS, which is really a different thing.

Keep in mind that a database basically is a logical inference engine, and that the relational model has been built on top of first-order logic (http://en.wikipedia.org/wiki/First-order_logic) and i can hardly believe that it is not able to represent realty.

Anyway I agree with you on one point. The relational model (and in particular its implementations) HAS TO evolve and improve, for example to solve the problem of time-dependent primary key. Probably an "Aspect-Oriented" Relational Model would be nice.
December 20, 2006 7:19 AM
 

Scott Whigham said:

I love this post, Paul - I love the challenging of accepting ways of doing things and thinking outside the box :)  

You said that you prefer stored procedures to handle your "more difficult" transactions instead of doing this in a trigger or the data access layer:

"As a workaround, developers have three options. Code the check in a trigger (extending transaction duration), code the check in a sproc (best option) or logical data access layer, copy valid PKs into a “bucket table” and point the FK to the bucket table (cheesy). "

Your thoughts that doing this code in a "sproc (best option)" are certainly different from a lot of folks. There's a whole subset of OO programmers who hate, hate, hate putting business rules inside the database. I disagree on their point but it is a debatable point, I suppose, and different opinions are great.

I don't agree with you that stored procedures are the best place to put your "business rules that can't be handled in a constraint." You say it in such an off-handed way that a newbie reading this post would possibly assume that this is how everyone thinks (i.e., a "Best Practice" or an "Industry Standard") - but it isn't. It's your *opinion* that stored procedures are the best option, and I can't say that I agree with that 100%. When you say "sprocs (best option)" it's an absolute and someone could interpret it to mean, "I should always do x,y,z in a sproc instead of the data access layer or a trigger." I don't know whether that's what you meant but it's certainly the way it comes off to me and, since it is a contentious point, it's worth mentioning.

Personally, for the example you listed about not giving a paycheck to a fired employee, a trigger fits best IMO (just talking in generalities without getting too specific about the problem). The problem with coding business rule in a stored procedure is that, unless you go through the stored procedure to process paychecks, the business rule is not applied. Example: we receive an import from another system (like a text file with 1,000 rows) and we need to process the data and send out the paychecks. With your sproc approach, I would basically be doomed to looping through every row from the text file and, for each row, I would pass the row's values into the stored proc - as opposed to doing BULK INSERT (for example) and letting the trigger do it's work (of course, you have to FIRE_TRIGGERs).

Is my approach right or is your approach right? If I'm right does that mean you are wrong? No, of course not; it's just two different approaches. As long as we both have cogent reasoning behind our choices and the end result is the same then we're both right!

Anyway, I only added this so that a newbie coming into SQL Server would realize that not everyone thinks the above example is best done via stored procedures.
December 21, 2006 4:05 AM
 

Ivan Tikhonov said:

Relational database is a data storage. You put bytes to it and get bytes out of it. Terms like "payroll" are not applicable at this level, it must be defined at higher levels.

Writing applications on dbms like doing  symbolic computations in assembly programming.
December 26, 2006 11:58 PM
 

Paul Nielsen said:

Hi Scott, A pleasure to see your name in this blog.

I prefer sprocs over triggers for two reasons:

1) If the gaol is to reduce the aggregate database workload by reducing the individual transactions, a well written sproc can be more wfficient than a trigger because trigger code extends the duration of the transaction and therefore extends locks causing a domino effect of locking and blocking. For example, placing data validation code in a trigger means the transaction itself is longer than placing that same data validation code in a sproc prior to the insert or update transaction. but this only works if (2)...

2) I firmly believe that the key to an extensible database schema is the abstraction layer. Placing data validation and database rules (that can't be done in constraints) in sprocs makes the assumption that the data abstraction layer is strictly enforced and written using sprocs.


-Paul
December 28, 2006 10:14 AM
 

Louis said:

Ivan, I see no basis for the thought that a relational database is just storage.  It is more a complex set of tools to store AND validate data.  If you understand it, it is a very natural way to work (including only constraints and triggers for validation only)

I far prefer for data access/modifications to be done in stored procedures, but this is a debatable point.  Whether or not constraints and triggers are best is not.

The main reason is trustability.  Constraints can be trusted to do their job, where client code can be easily avoided.  Losing the ability to do insert onetable() select ... from theother is just the first issue.  Even if you double check the data in the client/business layer (which I generally reccommend) the fact that a column with a domain of 1 - 10 can be always trusted to have values between 1 and 10 makes a big difference when doing queries, *especially* for ETL..  All it takes is one outlier and boom, there goes my pager again :(  

Louis

December 30, 2006 10:29 AM
 

David Portas said:

Quote: "The relational model was a breakthrough in the mid 60’s, but unless the model can continue to develop, it will eventually be overrun by OOA/D."

However, the insufficiencies you've mentioned are not limitations of the model (not even of the model "as implemented") but of some SQL DBMSs. Not even all SQL DBMSs - in fact the SQL standard supports generalized constraints, which address your point about workflow state very well. So yout title is hype. I think you really mean "Microsoft should improve SQL Server". I agree! :-).

January 2, 2007 2:27 PM
 

Dan said:

Is the relational model so limited? Is the object-oriented model

more powerful in the core application of information storage and retrieval?

February 25, 2007 10:24 AM
 

Louis Davidson said:

Starts out like this: "I’ve been designing relational databases since the mid 80’s. My conclusion is

April 6, 2007 8:37 PM
 

Dave said:

Validity is an interesting data vector but the next obvious evolution would be to model data in terms of temporal validity.  One that for any non trivial business scenario (consider full supply chain forfillment) generates reams of validation code that looks like it should be generated not written.

For example when I ship an order what should I print on the label.

The Relational model is an atomic representation of data + relation to remove the need to store redundant data in a denormalised form.  Its atomic operations are ALL necessary to maintain integrity and it is complete in that it is possible to perform any higher level operation on any related data by combining its atomic operations inside transactions.

A Temporal Model of data is more difficult to consider using relational terms.  Technically this model is currently implemented in code in application or database layer (although it may not be modelled at all).  

I have doubts as to whether temporally consistent data could be 'enforced' in a database engine without massively increasing locking scenarios. Some solutions involve limiting the change of temporal validity of referenced data to being done outside business hours to avoid locking contentions.

http://en.wikipedia.org/wiki/Temporal_database

August 13, 2007 11:06 PM
 

dave.dolan said:

I would agree that there are many limitations with the relational model and that it doesn't very well represent reality; however, SQL is not the relational model, and the relational model is not SQL.  It seems that speaking about what should and should not be implemented in a language handling the relational model (ie T-SQL or PL/SQL or SuperPlusPlusYear10000 SQL) is still not talking about the relational model, but the facility of employing boyce and codds ideas - a query language.  The real limitations are conceptual, not implementational, per se.  Any degree of normalization, for example, or the requirement that someone specify all data in columns, and that there be some form of representing 'NULL's all make the relational model diverge from reality.  A person can know many things, but they do not know it by way of which foreign key to follow. Certainly referential integrity only helps to mimic real world scenarios most of the time, and there isn't one in which one can call reality out and throw a syntax error when some expected bit if data is missing.  We also don't expect, in real life, that all data will conform to our schema. We can't, for example, go around point out in real experiences 'dirty data' just because it has something quirky that doesn't mesh with our picture of reality and validate according to our schema.  It may seem trite of me to bring all of this up, but this impedance mismatch and rigid definition of structure creates huge stumbling blocks to the design - and - implement cycles.  We dismiss it all now because we're used to dealing with it, but it's really a bit of a twilight zone compared to reality.

July 20, 2008 10:55 PM

Leave a Comment

(required) 
(required) 
Submit

About Paul Nielsen

Paul Nielsen believes SQL is the romance language of data. As such he’s a hands-on database developer, Microsoft SQL Server MVP, trainer, and author of SQL Server Bible series (Wiley). As a data architect, he developed the concepts of Smart Database Design and Nordic – an open source O/R dbms for SQL Server. He lives in Colorado Springs.

This Blog

Syndication

News

news item test
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement