Even the simplest data integrity rules are not easy to
implement in the database if instead of individual rows we need to deal with
groups or subsets.
For example, making sure that a column value is unique in a
table is as trivial as creating a unique constraint or index. However, what are
we supposed to do if we do not want to store a group of values twice? What if
we store cooking recipes as sequences of steps (sets of values), and we do
not want to store one and the same recipe (one set of values) more than once?
In the following sections we shall solve this problem using only constraints and indexed views, just for fun, as a SQL puzzle. (Of course, we could use a trigger or some other thing with subqueries - but that would be a different puzzle).
At the time of this writing I have not yet used in
production the technique I am going to describe. I think it is quite
complex and looks tricky, and the reason is simple: in my opinion RDBMS have not
been designed to deal with this kind of problems. As such, we have to come up
with complex workarounds.
Enforcing Uniqueness of Ordered Sets, of Limited Size Only
In this solution we shall deal with sets that have at most five elements.
In some cases, the order of elements in a subset matters.
For example, the following two sequences of instructions yield different
results.
Cooking a hard-boiled egg:
- Put eggs into pan, add water
- Bring water to boiling, boil 4 minutes;
- Remove eggs and cool them down in cold water
Keeping the egg uncooked:
- Put eggs into pan, add water
- Remove eggs and cool them down in cold water
- Bring water to boiling, boil 4 minutes
These two recipes are clearly different, and they yield
different results. Of course, the second recipe makes no sense, but we want the
database to store them both.
Suppose, however, that the database should not be able to
store one and the same recipe more than once.
Setting up tables and test data
Before implementing this requirement, let us create the
tables and add test data. Both recipes discussed above involve the same three
steps:
CREATE TABLE dbo.RecipeSteps
(
RecipeStepId INT NOT NULL ,
Intsructions VARCHAR(50) NOT NULL ,
CONSTRAINT PK_RecipeSteps PRIMARY KEY ( RecipeStepId )
) ;
GO
INSERT INTO dbo.RecipeSteps
( RecipeStepId, Intsructions )
VALUES ( 1, 'Put eggs into pan, add water' ),
( 2, 'Bring water to boiling, boil 4 minutes' ),
( 3, 'Remove eggs and cool them down in cold water' ) ;
Let us upload upload the first recipe, as follows:
CREATE TABLE dbo.RecipesStepByStep
(
RecipeId INT NOT NULL ,
StepNumber SMALLINT NOT NULL ,
RecipeStepId INT NOT NULL ,
CONSTRAINT PK_RecipesStepByStep PRIMARY KEY ( RecipeId, StepNumber ) ,
CONSTRAINT FK_RecipesStepByStep_RecipeSteps FOREIGN KEY ( RecipeStepId ) REFERENCES dbo.RecipeSteps ( RecipeStepId )
) ;
GO
INSERT INTO dbo.RecipesStepByStep
( RecipeId, StepNumber, RecipeStepId )
VALUES ( 1, 1, 1 ),
( 1, 2, 2 ),
( 1, 3, 3 ) ;
SELECT StepNumber ,
Intsructions
FROM dbo.RecipeSteps AS rs
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE sbs.RecipeId = 1 ;
StepNumber Intsructions
---------- --------------------------------------------------
1 Put eggs INTO pan, ADD water
2 Bring water TO boiling, boil 4 minutes
3 Remove eggs AND cool them down IN cold water
Let us also add the second recipe:
INSERT INTO dbo.RecipesStepByStep
( RecipeId, StepNumber, RecipeStepId )
VALUES ( 2, 1, 1 ),
( 2, 2, 3 ),
( 2, 3, 2 ) ;
SELECT StepNumber ,
Intsructions
FROM dbo.RecipeSteps AS rs
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE sbs.RecipeId = 2 ;
StepNumber Intsructions
---------- --------------------------------------------------
1 Put eggs INTO pan, ADD water
2 Remove eggs AND cool them down IN cold water
3 Bring water TO boiling, boil 4 minutes
Right now nothing prevents us from adding a second copy of
the first recipe. You can run the following script and see for yourself that it
succeeds:
BEGIN TRANSACTION ;
INSERT INTO dbo.RecipesStepByStep
( RecipeId, StepNumber, RecipeStepId )
VALUES ( 3, 1, 1 ),
( 3, 2, 2 ),
( 3, 3, 3 ) ;
SELECT StepNumber ,
Intsructions
FROM dbo.RecipeSteps AS rs
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE sbs.RecipeId = 3 ;
ROLLBACK ;
To enforce uniqueness of recipes, we shall use an indexed
view and a unique index on top of it.
To keeps the scripts simple, let us assume that recipies
cannot consist of more than five steps:
ALTER TABLE dbo.RecipesStepByStep
ADD CONSTRAINT RecipesStepByStepMax5 CHECK (StepNumber BETWEEN 1 AND 5) ;
Let us also create an indexed view that stores all the steps
of a recipe in one row.
Once we have all the steps in one row, we can use a unique
index to make sure that a sequence of steps is unique. We want the contents of
the indexed view to look as follows, with the first line representing the first recipe:
RecipeId Step1 Step2 Step3 Step4 Step5
----------- ----------- ----------- ----------- ----------- -----------
1 1 2 3 0 0
2 1 3 2 0 0
Note: To get around the limitations of indexed views, we
need to make sure that all the pivoted columns are not nullable. This is why
columns Step4 and Step5 are zeroes instead of nulls – otherwise we would not be
able to create the unique index RecipesStepByStepsPivoted_UniqueRecipes in the
following script:
CREATE VIEW dbo.RecipesStepByStepsPivoted
WITH SCHEMABINDING
AS
SELECT RecipeId,
count_big(*) AS Cnt,
SUM(CASE WHEN StepNumber = 1 THEN RecipeStepId ELSE 0 END) AS Step1,
SUM(CASE WHEN StepNumber = 2 THEN RecipeStepId ELSE 0 END) AS Step2,
SUM(CASE WHEN StepNumber = 3 THEN RecipeStepId ELSE 0 END) AS Step3,
SUM(CASE WHEN StepNumber = 4 THEN RecipeStepId ELSE 0 END) AS Step4,
SUM(CASE WHEN StepNumber = 5 THEN RecipeStepId ELSE 0 END) AS Step5
FROM dbo.RecipesStepByStep
GROUP BY RecipeId ;
GO
CREATE UNIQUE CLUSTERED INDEX RecipesStepByStepsPivoted_CI
ON dbo.RecipesStepByStepsPivoted(RecipeId) ;
GO
CREATE UNIQUE INDEX RecipesStepByStepsPivoted_UniqueRecipes
ON dbo.RecipesStepByStepsPivoted(Step1, Step2, Step3, Step4, Step5) ;
GO
SELECT RecipeId ,
Step1 ,
Step2 ,
Step3 ,
Step4 ,
Step5
FROM dbo.RecipesStepByStepsPivoted ;
RecipeId Step1 Step2 Step3 Step4 Step5
----------- ----------- ----------- ----------- ----------- -----------
1 1 2 3 0 0
2 1 3 2 0 0
It is easy to verify that the unique index built on top of
the indexed view does enforce the uniqueness of recipes:
BEGIN TRANSACTION ;
INSERT INTO dbo.RecipesStepByStep
( RecipeId, StepNumber, RecipeStepId )
VALUES ( 3, 1, 1 ),
( 3, 2, 2 ),
( 3, 3, 3 ) ;
SELECT StepNumber ,
Intsructions
FROM dbo.RecipeSteps AS rs
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE sbs.RecipeId = 3 ;
ROLLBACK ;
Cannot INSERT duplicate KEY row IN object 'dbo.RecipesStepByStepsPivoted' WITH UNIQUE INDEX 'RecipesStepByStepsPivoted_UniqueRecipes'.
The statement has been terminated.
RecipeId Step1 Step2 Step3 Step4 Step5
----------- ----------- ----------- ----------- ----------- -----------
1 1 2 3 0 0
2 1 3 2 0 0
Note: because we are using an undex to enforce uniqueness,
we must set a limit on the number of elements. It does not have to be as low as
five – we set it that low just to keep the examples short.
Let us close a couple of loopholes in this limited solution
and move on to another problem.
First, we used zeroes in the indexed view to indicate that
there is no such step in the recipe. In our example both recipies had three
steps, so the indexed view had zeroes in Step4 and Step5 columns in both rows.
To distinguish between missing steps and valid step IDs, we need to make sure
that zero is not a valid ID:
ALTER TABLE dbo.RecipeSteps
ADD CONSTRAINT CHK_RecipeSteps_PositiveId CHECK ( RecipeStepId > 0 );
Also we need to make sure that step numbers in the recipe
have no gaps, otrherwise we still can store the same recipe twice. The
following script demonstrates the loophole:
BEGIN TRANSACTION ;
INSERT INTO dbo.RecipesStepByStep
( RecipeId, StepNumber, RecipeStepId )
VALUES ( 3, 1, 1 ),
( 3, 3, 2 ),
( 3, 5, 3 ) ;
SELECT StepNumber ,
Intsructions
FROM dbo.RecipeSteps AS rs
JOIN dbo.RecipesStepByStep AS sbs ON rs.RecipeStepId = sbs.RecipeStepId
WHERE sbs.RecipeId = 3 ;
ROLLBACK ;
StepNumber Intsructions
---------- --------------------------------------------------
1 Put eggs INTO pan, ADD water
3 Bring water TO boiling, boil 4 minutes
5 Remove eggs AND cool them down IN cold water
To close the loophole, we need to make sure that step
numbers have no gaps. The following script enforces the rule:
ALTER TABLE dbo.RecipesStepByStep
ADD PreviousStepNumber AS CAST(CASE WHEN StepNumber > 1 THEN StepNumber-1 END AS SMALLINT) PERSISTED ;
GO
ALTER TABLE dbo.RecipesStepByStep
ADD CONSTRAINT FK_RecipesStepByStep_PreviousStep
FOREIGN KEY(RecipeId, PreviousStepNumber)
REFERENCES dbo.RecipesStepByStep(RecipeId, StepNumber);
Conclusion
As we have seen, enforcing the uniqueness of ordered groups
of rows with constraints is doable, although somewhat complex. I am not sure if I would use this approach with a large production table. I think that RDBMS is not designed to natively solve these kinds of problems - this is why solving this apparently simple problem is so complicated.
If you are using some platform other than RDBMS to solve such problems, please comment this post and share your experiences.