THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kevin Kline

Timewarp: What Is a Relational Database?

Relational?!? Move On, Geezer!

Maybe you're thinking that relational databases management systems (RDBMSs), like Microsoft SQL Server and Oracle, are going the way of punched cards and rotary phones.  After all, there's been a lot of hype these days in the IT media about the rise of so-called NoSQL (Not Only SQL) databases.  Many new and upcoming CS and MIS graduates who like working with data might think that relational databases are, at best, soon-to-be legacy systems and, at worst, are a career dead-end.  

Wrong!!!

It's true that all the cool-cat computing services (Amazon, Facebook, Google, Pinterest, etc) are indeed making heavy use of NoSQL technology.  They're also making heavy use of traditional RDBMS'es too.  In fact, some of the world's biggest users of SQL databases are hand-in-hand the biggest users of NoSQL databases.  The reason for that is that both types of data platforms are exceeding good at specific types of data storage and data processing.  They also have their own unique weaknesses too.  Meaning, each platform has a sweet spot and a weak spot, and that none are a 100% panacea for all imaginable data processing scenarios.  Take a look at this article by my friend and former colleague, Guy Harrison - 10 Things You Should Know About NoSQL Databases, for a good discussion on the pros and cons of NoSQL in comparison to SQL data platforms.

Timewarp! Let's Take a Look Back at Why Relational Databases Were Needed.

These days, relational database management systems (RDBMSs) like Microsoft SQL Server and Oracle are the primary engines of information systems everywhere, particularly for enterprise computing systems and web applications. Though RDBMSs are now common enough to trip over, it wasn’t always that way. Not too long ago, you would probably trip over hierarchical database systems, or network database systems, or flat-file systems (heck, that still happens in many government IT shops who still use COBOL).  A quick-and-dirty definition for a relation database might be: a system whose users view data as a collection of tables related to each other through common data values.

Perhaps you are interested in more than a quick-and-dirty definition for the term relational database?  Here goes.  The whole basis for the relational model follows this train of thought: data is stored in tables, which are composed of rows and columns.  Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys.  This concept is so common as to seem trivial; however, it was not so long ago that achieving and programming a system capable of sustaining the relational model was considered a longshot with limited usefulness.  

Relational data theory was first proposed by E.F. Codd in his 1970 paper to the ACM entitled “A Relational Model of Data for Large Shared Data Banks”.  Soon after, Codd clarified his position in the 1974 paper to the Texas Conference on Computing Systems entitled “The Relational Approach to Data Base Management: An Overview”.  It was in this paper that Codd proposed the now legendary 12 Principles of Relational Databases. 

If a vendor’s database product didn’t meet Codd’s 12 item litmus tests, then it was not a member of the club.  Note that the rules do not apply to applications development.  Instead, these rules determine whether the database engine itself can be considered truly “relational”.  These rules were constructed to support a data model that would ensure the ACID properties of transactions and also eliminate a variety of data manipulation anomalies that frequently occurred on non-relation database platforms (and still do occur on non-relational database platforms). (As an aside, the transactional paradigm was conceived by my hero, Gray, Jim in 1981 while at Tandem Computer and presented in the paper "The Transaction Concept: Virtues and Limitations").

Codd’s 12 Rules for a Truly Relational Database System

Are you curious about Codd’s 12 Principles of Relational Databases? Don’t be ashamed that you don’t know them by heart; few technology professionals do, and no one on the marketing staff of technology companies do.  However, the few folks who do know these principles by heart treat them like religious doctrine, and would likely be mortified by their “lightweight” treatment here.  Nevertheless, I'll give them to you in my own paraphrasing:

  1. Information is represented logically in tables.
  2. Data must be logically accessible by table, primary key, and column.
  3. Null values must be uniformly treated as “missing information” not as empty strings, blanks, or zeros.
  4. Metadata (data about the database) must be stored in the database just as regular data is.
  5. A single language must be able to define data, views, integrity constraints, authorization, transactions, and data manipulation.
  6. Views must show the updates of their base tables and vice versa.
  7. A single operation must be able to retrieve, insert, update, or delete data.
  8. Batch and end-user operations are logically separate from physical storage and access methods.
  9. Batch and end-user operations can change the database schema without having to recreate it or applications built upon it.
  10. Integrity constraints must be available and stored in the metadata, not in an application program.
  11. The data manipulation language of the relational system should not care where or how the physical data is distributed and should not require alteration if the physical data is centralized or distributed.
  12. Any row-processing done in the system must obey the same integrity rules and constraints that set-processing operations do.

If you know much about SQL, then you probably recognize immediately that SQL ended up fulfilling rules #5, #7, #11 and possibly more.  Others of the rule are manifest in the system tables of a relational database, such as DMVs in Microsoft SQL Server and V$ and X$ views in Oracle.

Relational Rises

There is some debate about why relational database systems won out over hierarchical and network database systems back in the late 1980's and early 1990's, but a couple of reasons seem self-evident.  First, the high-level language interface (SQL)  is much simpler to learn and more intuitive than that mishmash of languages supporting non-relational databases.  (In fact, the lack of something like SQL is a hindrance to adoption of many NoSQL database platforms).  Second, relational databases provide efficient and intuitive data structures that easily accommodate ad-hoc queries and reporting.  People just intuitively understand the value of storing data in tables.  From phone books to hotel registries, relational databases (of the paper sort) are second nature to most people. Third, relational databases provide powerful integrity controls such as check constraints and referential integrity - thus providing higher quality data.  And high quality data is near and dear to the heart of CFOs around the world.  

In fact, the strength that relational databases demonstrate with data quality, consistency, and durability are the same reasons that they'll be with us - quite possibly - forever.  So were NoSQL databases excel at storing data that is moderately important and requires eventual consistency, SQL database excel at storing data that is of paramount importance and requires immediate consistency.  As long as we're exchanging money, there's a need for relational database technology and ACID transactions.

And, just my opinion here, but database administration is currently, and will continue for decades to be, an excellent career choice.  Why?  First, although databases are widespread, good databases are not.  So there's always need for those who can tune, troubleshoot, and optimize what is currently in the marketplace.  Second, just because database are widespread doesn't mean that they're everywhere they need to be.  Some estimates gauge that only half of the enterprises that need SQL databases actually use SQL databases.  Imagine if only half of the citizenry wore shoes, and of the half that wore shoes, only half of them wore both shoes and consistently tied them.  It'd be a good time to be a maker of loafers!  Well, that's where we're at today with relational databases.

So what do you think?  Am I off the mark on the longevity of relational database?  Do you think the sun has set on them?  Will they be smashed, degraded, and humiliated by NoSQL database platforms?  Or will they stand shoulder-to-shoulder with a variety of data platforms in the years to come?

 

 

Published Wednesday, September 05, 2012 3:23 PM by KKline

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Marc Shapiro said:

When you write "Tables of independent data can be linked, or related, to one another if they each have columns of data that represent the same data value, called keys." you make it sound like the word "relation" in "Relational Databases" has to do with foreign keys and joins between multiple tables.  But it doesn't.  It has to do with the relationship between the columns of a single table.  Each table represents a relation.  Many useful relations are actually functions (in the mathematical sense of both), which are represented by table with a primary key.  The domain (input) of the function is the primary key columns; the range (output) is the other columns.  Joins are like composition of functions (or of relations).

September 6, 2012 7:43 AM
 

KKline said:

Fantastic point to make, Marc. I may have over emphasized the aspect of relationships between tables and under emphasized the aspect of the relationship between columns within a table.

The columns within the table must indeed be directly related to the primary key.  As the old saying goes "Every column in the table must depend upon the key, the whole key, and nothing but the key. So help me Codd".  ;^)

-Kev

September 6, 2012 8:36 AM
 

Adam Machanic said:

So here's the $10,000,000 question: Is SQL Server a "relational" database product?

September 6, 2012 10:07 AM
 

mjswart said:

It looks like one, it smells like one. I think for any reason that matters, yes.

September 6, 2012 10:24 AM
 

Adam Machanic said:

But does it TASTE like a relational DBMS??

I am not so sure it looks or smells like one. And maybe that doesn't matter when it comes down to it; not like I don't live and breathe in the product every day, and usually get lots of work done with it. But it is interesting to think about at any rate.

September 6, 2012 1:40 PM
 

KKline said:

Look, smell, and taste?!? Ewww - too many senses involved in what should be a non-sensual experience!

But it's a great question.  Certainly, all of the major RDBMSes support the 12 Principles by degrees.  Access, for example, probably does an even poorer job than SQL Server does.  Otoh, I believe that SQL Server passes muster reasonably well, say an A-, if not a perfect score of 100%.

And, ironically, most all of the core database engine enhancements that have come out in the past few releases are decidedly NOT relational.  I'd point to SSAS, SSIS, ColumnStore, etc as features that go way outside of the domain of relational databases.  

That's another reason why it's much harder to know everything about SQL Server than in the good ol' days.  Many parts of the platform are governed by entirely different principles than those of the relatoinal approach.  Star or snowflake schema?  Sorry - not something in my experience.  Know what I mean?

September 6, 2012 2:39 PM
 

Adam Machanic said:

Hi Kevin,

Which "major" RDBMSs are you referring to? I can't think of any.

As far as I'm concerned SQL Server -- and every other SQL DBMS -- gets an F when it comes to being relational. As alluded to by Marc, the Relational Model is so named because it deals with relations. A relation is a set, and SQL is not set-based; it's bag-based. So simply put no SQL DBMS is, or can be, a Relational DBMS.

In a truly relational DBMS the following table could not be created:

CREATE TABLE xyz (a INT NOT NULL, b INT NOT NULL);

Why? Because of the Guaranteed Access Rule (rule #2): All data must be accessible via a primary key. But in this case we have no primary key, so that rule is 100% violated.

Another important aspect of the Relational Model is the idea of closure. Basically, any operation on a relation must produces another relation. But in SQL we can easily violate that. Consider the following table, which has a primary key (so it can at least be called a relation):

CREATE TABLE xyz (a INT NOT NULL, b INT NOT NULL, PRIMARY KEY (a,b));

Now let's insert some rows...

INSERT xyz

SELECT 1, 2

UNION ALL

SELECT 1, 3;

... and now we run the following query:

SELECT

 a

FROM xyz;

... And we've just violated relational closure, because the result of our operation is a bag and is therefore not a relation. Oops.

Back to Codd's rules, SQL DBMSs do a decent job with most of the rest of the rules, although it can be argued that they don't tend to do very well when it comes to logical or physical data independence. But a full discussion on that topic is a bit beyond what I feel like typing into this comment box.

Finally back to Michael's point, yet again: Does any of this matter? Perhaps not. We all make our living working with SQL DBMSs. Whether or not these products are actually relational databases -- or something else entirely -- is essentially an academic question and won't change any of our lives. But it does make for an interesting conversation.

--Adam

September 6, 2012 7:18 PM
 

Robert Young said:

For reasons known (ultimately) to the Scions of Armonk, Codd was removed from the development of the Data Access Language (first called SEQUEL, for Structured English QUEry Language, thence SQL for murky reasons).  Chamberlin, an IMS cowboy and no language or math theorist, got control.  That's why SQL is the bastard language that it is.

Codd rarely, if ever, used the column/table metaphor.  It's about relations.  Read Date's Eighth Edition for the closest thing to pure Codd.  Date disagrees with Codd on some points.

As a smart person once said (and I've long since lost the cite):  "there are more hierarchies in code than in the real world".  The xml zealots managed to convince the feeble minded that hierarchy is somehow the "natural" data structure of the world.  Not even org charts are hierarchies.  Yes, they are simple to create, but a devil to modify.  Once you make it, the "relations" are hard coded.  Talk to an IMS DBA about that.  A RM structure is more work to specify, but if primary keys are true, then adding non-key data to a table or a new related table is trivial.

As to PK enforcement:  a legacy of COBOL/IMS at the time.  It is not a big deal to write an engine which requires PKs on all tables.  Coders didn't, and don't, like the restriction.  So RDBMSs don't implement it.

With the rise of multi-processor/core/thread/SSD machines that cost about as much as a Starbucks Latte, [de|un]normalized schemas will soon disappear, at least for new development.  The smaller footprint of a 5NF database is so much smaller and robust, that even the COBava coders will be brought to heal.  Coupled with The Cloud meme (just a New Age version of The Glass House mainframe central computer), where client side transaction control makes even less sense, we'll see the return to the client being just a 3270/VT-220.  A dumb terminal.  Ah the irony of it all.

September 8, 2012 9:13 AM
 

Davos said:

Pure Codd, sounds tasty.

Adam, I agree with the first violation about a table created with a Primary Key, but not the second example.

"And we've just violated relational closure, because the result of our operation is a bag and is therefore not a relation."

The returned results are a bag but the stored data structure is not and a select is not an 'operation' that changes that. I do get the point though, the rules have certainly been relaxed in practice, but that's just about being pragmatic as opposed to dogmatic.

September 9, 2012 8:08 PM
 

Adam Machanic said:

Davos:

You seem to be missing the point of closure: all operations take as their input a relation, and in turn output a relation, thereby allowing operations to be stacked or composed in any possible form.

SELECT *

FROM Table

-->

SELECT *

FROM

(

 SELECT *

 FROM SomeOtherTable

) AS x

Assuming that SQL were in fact relational, if the inner table expression in this example failed to return a valid relation the entire system would break down. (Guaranteed Access applies and means that any tuple in [x] should be able to be referenced via a primary key.) It doesn't matter whether or not the results are "stored."

Relaxing these restrictions for SQL was not in the least bit pragmatic. I can't count the number of occasions over the course of my career on which I've found serious data integrity problems or code bugs due to duplicate key scenarios. I shudder to think of the amount of money and time that has been wasted due to this decision.

--Adam

September 10, 2012 12:18 AM
 

Luke Jian said:

I agree with Adam here.

SQL Server does not comply with all 13 rules stated by by Codd to check if an DBMS is truly relational. But then again none of the major DBMSs are  compliant either because the rules are too strict now and id does not give the flexibility that the market needs. I would grade SQL server a C- on the Codd Rule.

AFAIK the only true RDBMS that would pass the rules with A would be Dataphor according to Wikipedia: http://en.wikipedia.org/wiki/Codd%27s_12_rules

September 10, 2012 1:05 PM
 

Adam Machanic said:

Luke,

Why would complying with the rules make the systems any less flexible?

Put another way: What benefit do you get from duplicate rows that you couldn't get from, e.g., storing a single copy of the row and an integer ("count" or similar)?

Consider: If you go to the grocery store and buy 15 apples do you want 15 lines on your receipt, or a single line that indicates that you purchased 15 apples?

A system built around constraints that force people to make better design decisions sounds restrictive in theory but I can think of no reasonable design that it would hinder. (Unreasonable designs, like "we need to store all 15 apples separately for audit purposes that I think we might have, maybe, sometime in the future" do not count.)

--Adam

September 10, 2012 1:57 PM
 

Bret Lowery said:

I work with SQL Server... at Yahoo! I also work with columnar stores like ParAccel, appliances like Netezza, and NoSQL environs like Aerospike (formerly Citrusleaf). The right tool for the right job.

September 10, 2012 3:57 PM
 

Andy Irving said:

Any RDBMS that supports SQL isn't relational - no NULLs in the the relational model! And as Adam points out, you can create a table with no PK.

What's interesting though is that there's no readon why you couldn't stick a proper relation interface (something like Date & Darwin's tutorial D, or even something like LINQ) on top of the storage engine and query optimiser.

September 10, 2012 4:02 PM
 

Luke Jian said:

I agree with your points.

Actually the storage engine can do what you are saying (uniquely identify a row internally), its just not exposed to the users (like ROWID in Oracle).

When I was talking about flexibility I didn't have this rule in mind (e.g. system must use its relational facilities (exclusively) to manage the database).

There are a few others that would not make sense today,but overall I understand and share your pains!

September 10, 2012 5:12 PM
 

Adam Machanic said:

Andy: SQL's implementation of NULL has been the subject of a lot of criticism due to 3VL, but the idea of a missing-value token is definitely required in order to support semijoins, even if the schema doesn't use them anywhere. I don't think you can divorce it from the relational model unless you force everything to be normalized all the way to DKNF (at which point you could create domain-specific missing-value tokens). And then, again, you'd have the issue of closure and you'd have to force any intermediate or output relations to be in DKNF. Probably not especially feasible.

Luke: The storage engine can do that, but exposing it would violate the rule about physical independence.

September 10, 2012 10:17 PM
 

dvraggs said:

Should the tool be the whole system? The tool used to be a means to create a RDBMS. Using these tools and following the rules you create a RDBMS, not incompetent people being protected by a tool. You can create a RDBMS (key word System) using SQL Server, but it will not hold your hand. People + tool(s) create a RDBMS.

September 11, 2012 8:06 AM
 

Adam Machanic said:

dvraggs:

A DBMS is a tool used to create a database. A SQL DBMS does not create an RDBMS, no matter which people are involved. At the end of the day you have a database on a nonrelational platform and there's no way that's going to become relational. The platform itself simply does not support the required base set of rules.

Anyway, arguing that people should write everything themselves is a very slippery slope. Taking your logic one step further we should eliminate all constraints and DRI. People can hand-code that stuff. Oh, and we should eliminate tables too. People can write data structures in a 4GL pretty easily can't they? Don't need a query optimizer, because it's all just algorithms and people can learn those. And I guess I don't need a storage engine, because I am pretty good with ISAM. And I'll stop working in C#, because I can write my own garbage collection logic and libraries in C++...

You get the point, I hope. We create high level abstractions and rules to protect ourselves from reinventing problems and so that we can focus on more interesting tasks.

--Adam

September 11, 2012 9:48 AM
 

KKline said:

Adam, When you cite that no SQL dbms supports rule #2, "Data must be logically accessible by table, primary key, and column." you provide an  example in which you can break rule #2.

But are these rules, like the famous line from Pirates of the Caribbean, rigid rules or general guidelines?  Imo, when a SQL database is able to enforce #2, even though it doesn't specifically require #2 in any and every situation, it passes the test.  

It's a semantic nuance, certainly.  And I should go back to the original language to see if it's more clear in dispelling any leeway.  But it's enough for me, when a dbms CAN support the rules (even if it allows those self-same rules to be broken under specific conditions), to say that it is indeed a relational database.

But I'm also a pragmatist and don't really care that much about the purity of the observation of the rules.  I'm more interesting in learning about the duplicate key scenarios you mentioned and how to thwart them.

Luke, great tip on Dataphor! I'm going to check it out just out of curiousity.  =^)  

Very interesting comments, everyone!

September 11, 2012 2:55 PM
 

dbdebunker said:

Kevin,

Your heart is in the right place, but I don't think you identified properly the CORE issues and practical implications of the RM.

The comments do provide some good clarifications, but still miss some important points.

I guess this post and its comments is a good target for debunking, for educational purposes.

FP

www.dbdebunk.blogspot.com

September 13, 2012 3:44 PM
 

Andy Irving said:

Adam: Yeah. Date & Darwin go further though, with 6NF. some ideas in this paper: http://www.dcs.warwick.ac.uk/~hugh/TTM/Missing-info-without-nulls.pdf

personally i'd love a fully relational future, but then i also love functional programming :)

September 18, 2012 11:08 AM
 

Erwin Smout said:

Well, I sincerely believe that Oracle and SQL Server and DB2 and Sybase and ... should indeed go the way of punched cards.  The sooner the better.  Just so long as they are replaced by relational systems.

And it might have been better if those twelve rules were quoted, instead of paraphrased as heavily as they were.

October 17, 2012 11:39 AM
 

Erwin Smout said:

@KKline : I think the only reasonable way to interpret rule 2 is that the DBMS should be such that **IN ALL POSSIBLE CIRCUMSTANCES**, the access is guaranteed.  If there exists any circumstance in which this is not the case, then the engine by itself doesn't satisfy the rule.

Your interpretation of "satisfies the rule if it is possible for the user to set things up such that the rule happens to be satisfied", makes even for a combination of COBOL+ISAM to satisfy the rule.

October 17, 2012 5:26 PM

Leave a Comment

(required) 
(required) 
Submit

About KKline

Kevin Kline is a well-known database industry expert, author, and speaker. Kevin is a long-time Microsoft MVP and was one of the founders of PASS, www.sqlpass.org.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement