Currently at work I’m on a data warehousing project and one of the things we typically need to do during development is clear down our data warehouse in preparation for simulating a first-time load. I needed to write a script that would loop over all the tables and empty them one-by-one (that bit is not particularly hard) but which would also respect referential integrity constraints (i.e. foreign keys) – or what I would call the dependency hierarchy. This blog post documents what I came up with.
N.B. No doubt many reading this will already have their own method of doing this which may also be very different (and easier) from mine but nonetheless I thought I’d throw this out there anyway because (a) it demonstrates some nice features of T-SQL such as common table expressions, (b) it was a fun little challenge to try and solve, (c) I’ll probably need this again some day and (d) why the hell not!
I’ll demonstrate using the [AdventureWorks] sample database. First thing I needed to do was to get a list of all the tables in all schemas that had at least one foreign key attached to it; for this I need to use the sys.foreign_keys view:
select s1.name as from_schema
, o1.Name as from_table
, s2.name as to_schema
, o2.Name as to_table
from sys.foreign_keys fk
inner join sys.objects o1
on fk.parent_object_id = o1.object_id
inner join sys.schemas s1
on o1.schema_id = s1.schema_id
inner join sys.objects o2
on fk.referenced_object_id = o2.object_id
inner join sys.schemas s2
on o2.schema_id = s2.schema_id
/*For the purposes of finding dependency hierarchy
we're not worried about self-referencing tables*/
where not ( s1.name = s2.name
and o1.name = o2.name)
Notice that I am disregarding self-referencing tables which are not significant for determining the dependency hierarchy.
That SQL provides the following result:
I thought about using the OBJECT_NAME() and SCHEMA_NAME() functions but decided against it because they won’t work across databases which I eventually intend this to do. Update: Thanks to Ben in the comments who pointed out that OBJECT_NAME() and SCHEMA_NAME() can take a database_id argument so I could have used them after all!
All good so far. Now we need to work out which of those tables do not have any dependencies (i.e. do not have any foreign keys pointing to them). We do a LEFT OUTER JOIN between the list of all tables and our previous list of tables with foreign keys which enables to discover which are not in that previous list:
with fk_tables as (
select s1.name as from_schema
, o1.Name as from_table
, s2.name as to_schema
, o2.Name as to_table
from sys.foreign_keys fk
inner join sys.objects o1
on fk.parent_object_id = o1.object_id
inner join sys.schemas s1
on o1.schema_id = s1.schema_id
inner join sys.objects o2
on fk.referenced_object_id = o2.object_id
inner join sys.schemas s2
on o2.schema_id = s2.schema_id
/*For the purposes of finding dependency hierarchy
we're not worried about self-referencing tables*/
where not ( s1.name = s2.name
and o1.name = o2.name)
)
SELECT s.name as schemaName
, t.name as tableName
, 0 AS Level
FROM ( select *
from sys.tables
where name <> 'sysdiagrams') t
INNER JOIN sys.schemas s
on t.schema_id = s.schema_id
LEFT OUTER JOIN fk_tables fk
ON s.name = fk.from_schema
AND t.name = fk.from_table
WHERE fk.from_schema IS NULL;
In AdventureWorks that gives us 25 such tables out of 71 in total. Next is to find a list of tables that have a dependency on one of those 25 tables and then in turn which tables have a dependency on the tables in that second list and so on. Hence its a recursive operation and hence we need a recursive common table expression (CTE). We use our previous result set as the anchor and then it join back to our original list of tables that contain foreign keys. That gives us many tables appearing twice and hence we need to join the eventual result set back onto itself to discover how many max levels “deep” a table lies in the dependency hierarchy:
with fk_tables as (
select s1.name as from_schema
, o1.Name as from_table
, s2.name as to_schema
, o2.Name as to_table
from sys.foreign_keys fk
inner join sys.objects o1
on fk.parent_object_id = o1.object_id
inner join sys.schemas s1
on o1.schema_id = s1.schema_id
inner join sys.objects o2
on fk.referenced_object_id = o2.object_id
inner join sys.schemas s2
on o2.schema_id = s2.schema_id
/*For the purposes of finding dependency hierarchy
we're not worried about self-referencing tables*/
where not ( s1.name = s2.name
and o1.name = o2.name)
)
,ordered_tables AS
( SELECT s.name as schemaName
, t.name as tableName
, 0 AS Level
FROM ( select *
from sys.tables
where name <> 'sysdiagrams') t
INNER JOIN sys.schemas s
on t.schema_id = s.schema_id
LEFT OUTER JOIN fk_tables fk
ON s.name = fk.from_schema
AND t.name = fk.from_table
WHERE fk.from_schema IS NULL
UNION ALL
SELECT fk.from_schema
, fk.from_table
, ot.Level + 1
FROM fk_tables fk
INNER JOIN ordered_tables ot
ON fk.to_schema = ot.schemaName
AND fk.to_table = ot.tableName
)select distinct ot.schemaName
, ot.tableName
, ot.Level
from ordered_tables ot
inner join (
select schemaName
, tableName
, MAX(Level) maxLevel
from ordered_tables
group by schemaName,tableName
) mx
on ot.schemaName = mx.schemaName
and ot.tableName = mx.tableName
and mx.maxLevel = ot.Level;
That gives us the full list of tables along with their level in the dependency hierarchy:
Essentially then the hard work is done and we simply need to plug that result set into a cursor (in descending order of [Level]), emptying each one as we go. I won’t include that code here but you can download the full script from my Skydrive:
That script has been modified slightly in order to take a database name as a parameter and hence must be run in using SQLCMD (or use SQLCMD mode in SSMS).
Comments are welcome!
@Jamiet
UPDATE: Based on the code herein I have built sp_CascadingDataViewer. From that blog post:
sp_CascadingDataViewer has one basic premise: it tells you how many rows in your database are dependant on a row (or set of rows) in a table. Or to put it another way, how many rows would you have to delete from your database in order to be able to delete a given row (or set of rows). All you need to do is specify.
- A database
- A schema in that database
- A table in that schema
- A set of columns in that table
- A set of values, one for each specified column respectively
and sp_CascadingDataViewer will tell you all the rows dependant on the specified rows and optionally show you all of those dependant rows as well.
P.S. Code formatting was done using Paste As Visual Studio Code plugin for Windows Live Writer