THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Dejan Sarka

SQL Set Operators. Set? Really?

UNION, INTERSECT and EXCEPT operators are commonly called Set Operators. For example, in Books Online you can find a topic “Set Operators”, where  these three operators are explained. They should represent set operations UNION, INTERSECT and MINUS (synonym for EXCEPT DISTINCT). Also Wikipedia has a topic called “SET OPERATIONS (SQL)”, where these three operators are introduced. And these operators are commonly represented by Venn diagrams. Logically, Venn diagrams are also called Set diagrams. Here are the three operators presented with Venn diagrams:

image

However, is the name “Set Operators” really correct? The first question I asked myself was very simple: why would we have 10 and more relational operator and three set operators in the relational algebra? Well, makes no sense. The relational algebra comprise relational operators only, of course.

So what exactly is a relation? A relation is a special kind of set, set of entities that are related, i.e that are of the same kind. How do we know that the two entities are of the same kind and can thus be grouped in a single entity set, i.e. in a relation? Of course, two entities are of the same kind if they have the same attributes. Therefore, every relation is a set; however, not every set is a relation.

Set operators work on sets and produce a set. Relational operators work on relations and produce a relation. SQL operators UNION, INTERSECT and EXCEPT produce relations, i.e. special kind of sets. Set operator UNION can combine a set of differential equations and a set of hammers into a single set. Relational operator UNION can’t combine a relation of differential equations and a relation of hammers into a single relation, because elements of these two relations have nothing in common. And don’t think that if you take only keys of both relation, and both have a single-column integer key, that a UNION of this would be a relation. First of all, you can do such an union because we don’t use strong types in a relational database (each key should be of its own type – in this case, you should have a “hammer” and a “differential equation” key types, which would disallow such operations). In addition, in the case I mentioned, you would get two a relations that has with a single attribute, the key only, which would probably be meaningless from the business perspective. A relation without a meaningful attribute is not really an entity set, as defined by Peter Chen. An entity is something we can identify and is of interest. If we don’t have any real attribute, then this “thing” (whatever) is definitely not an entity, because without attributes it can’t be of any interest.

To summarize: SQL UNION, INTERSECT and EXCEPT are simply relational operators. Talking about them as of set operators is at least imprecise. However, representing them with Venn diagrams is not just imprecise, it is wrong. Here is a better presentation of there three relational operators.

image

Published Friday, January 10, 2014 3:17 PM by Dejan Sarka
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Marc Shapiro said:

A few nits:

First, the defining characteristic of a relation is that the values in certain columns are related to the values in other columns.  In a relation with attributes not in the key (i.e., a function), those attributes are a function of the key columns.   You write "entities are related, i.e. that are of the same kind", which is implying that the relation is between rows.  But each row belongs to the relation because its columns are related in that particular way.

Second, your final diagram for Union (and similar for Intersect and Minus) has grey columns on the sides which don't match between the left and right tables (or attributes not matching between the left and right relations if you prefer).  I believe the distinguishing characteristic of the set operations is precisely that the columns in both the left and right inputs must be the same, and that the result has the same columns.  Diagrammatically, your rectangles should be aligned.  This, of course, makes them look like a squared off version of the Venn diagrams.

Third, a table where all of the columns are in a unique key (a relation where all attributes are in the key) is, mathematically, a relation which is not a function.  These are very interesting and important in general.  A special case of these has only one attribute (or column); and these are just as interesting even if you dismiss them.  For example, all finite sets of integers can be represented as such relations: the set of positive even numbers less than 100, the set of positive odd numbers less than 100, the set of primes less than 100, the set of all positive integers less than 100, and so on.  There are plenty of interesting operations that one can do with such things.  And they are definitely relations.

January 10, 2014 9:08 AM
 

Dejan Sarka said:

Marc,

Thank you very much for your valuable comments! For the first and the third one, I see I was sloppy, and I understand completely that I should not be sloppy in a post where I complain about something being imprecise. I corrected the post in such a way that the corrections are clear.

About your second comment. Yes, the columns in both the left and right inputs must be the same, as you say. But this is exactly my point as well. This is not a condition for set operators. Therefore, I think it is better to call the SQL operators "relational operators" thank "set operators with limitations". And from Venn (or Set) diagrams you don't see these limitations. With the diagrams I draw, I think it is more clearly shown that this operators work only on attributes that are shared between both input relations.

Thank you again for your comment!

January 11, 2014 3:32 AM

Leave a Comment

(required) 
(required) 
Submit

About Dejan Sarka

Dejan Sarka, MCT and SQL Server MVP, is an independent consultant, trainer, and developer focusing on database & business intelligence applications. His specialties are advanced topics like data modeling, data mining, and data quality. On these toughest topics, he works and researches together with SolidQ and The Data Quality Institute. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or coauthor of eleven books about databases and SQL Server, with more to come. Dejan Sarka also developed and is developing many courses and seminars for SolidQ, Microsoft and Pluralsight. He is a regular speaker at many conferences worldwide for more than 15 years, including conferences like Microsoft TechEd, PASS Summit and others.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement