If you have two columns in a table that "remember" their old values in two different historical tables, as it is the case in AdventureWorks with EmployeePayHistory and EmployeeDepartmentHistory, you may need (or just wonder how) to merge these two tables into only one historical table that will keep track of both columns.
This has been my "problem of the day" and, after some thoughts, I came up with a pattern of solution that I'd like to share with you both to see if you have a better solution and to have the code at hand when I'll need later. :)
Before using the real table, I made up a simple test case with a Product table with only two fields a couple of variation tables (VariationA and VariationB):
--
-- Sample table Products, only two columns A and B. The current value of the row is (D, 4)
--
CREATE TABLE Products (
A CHAR (1),
B INT
)
INSERT INTO products (A, B) VALUES ('D', 4)
--
-- Variations for column A
--
-- ------------------
-- Year Old Value
-- ------------------
-- 2001 A
-- 2002 B
-- 2003 C
-- 2005 D
--
CREATE TABLE VariationA (
Position INT,
OldValue VARCHAR (5),
ChangeDate DATETIME )
INSERT VariationA (Position, OldValue, ChangeDate) VALUES (1, 'A', '01/01/2001')
INSERT VariationA (Position, OldValue, ChangeDate) VALUES (2, 'B', '01/01/2002')
INSERT VariationA (Position, OldValue, ChangeDate) VALUES (3, 'C', '01/01/2003')
INSERT VariationA (Position, OldValue, ChangeDate) VALUES (4, 'D', '01/01/2005')
--
-- Variations for column B
--
-- ------------------
-- Year Old Value
-- ------------------
-- 2000 1
-- 2003 2
-- 2004 3
--
CREATE TABLE VariationB (
Position INT,
OldValue INT,
ChangeDate DATETIME )
INSERT VariationB (Position, OldValue, ChangeDate) VALUES (1, 1, '01/01/2000')
INSERT VariationB (Position, OldValue, ChangeDate) VALUES (2, 2, '01/01/2003')
INSERT VariationB (Position, OldValue, ChangeDate) VALUES (3, 3, '01/01/2004')
The query that mixes all these variations into a single table is pretty easy but it took me some time to discover. My solution is to detect what was the value of "B" when a variation in "A" happened and vice versa. As the values stored are "old" values, we know that the value of A at a certain date is the value stored in the first variation for A AFTER that date. The only special case is the current time: if no variation record is found then we know that the value to use is the current value of the record in the product table. The same is obviously true for B.
I ended up with this pattern query:
--
-- AllEvents contains all the events from both VariationA and VariationB,
-- and will detect, for each variation that happened, what was
-- the value of the other column at that time, building in this way the
-- merged variation list.
-- The final SELECT will return the sorted and DISTINCTed result.
-- Then final COALESCE is needed because if a value is NULL it
-- means that it should contain the "current" value of the column
--
WITH AllEvents AS (
SELECT
OldValueOfA = Events.OldValueOfA, -- Old value of A
OldValueOfB = Events.OldValueOfB, -- Old value of B
ChangeDate = Events.ChangeDate -- Date of change
FROM
(SELECT
OldValueOfA = OldValue,
OldValueOfB = (SELECT TOP 1 OldValue
FROM VariationB V
WHERE V.ChangeDate >= VariationA.ChangeDate
ORDER BY ChangeDate),
ChangeDate
FROM VariationA
UNION ALL
SELECT
OldValueOfA = (SELECT TOP 1 OldValue
FROM VariationA V
WHERE V.ChangeDate >= VariationB.ChangeDate
ORDER BY ChangeDate),
OldValueOfB = OldValue,
ChangeDate
FROM VariationB
) Events
)
SELECT DISTINCT
ChangeDate = ChangeDate,
OldValueOfA = COALESCE (OldValueOfA, (SELECT A FROM Products)),
OldValueOfB = COALESCE (OldValueOfB, (SELECT B FROM Products))
FROM
AllEvents
ORDER By ChangeDate
Clearly, i could have written some VB code in SSIS to solve the same problem. Nevertheless, having a pattern query at hand is useful because the pattern can be easily adapted to any real world situation with only SQL Management Studio at hand. The following code is the implementation of the pattern for AdventureWorks to merge Department and Payment history for Employees:
WITH AllEvents AS (
SELECT
EmployeeID = Events.EmployeeID,
OldValueOfRate = Events.OldValueOfRate,
OldValueOfDep = Events.OldValueOfDep,
ChangeDate = Events.ChangeDate
FROM
(SELECT
EmployeeID = EmployeeID,
OldValueOfRate = Rate,
OldValueOfDep = (SELECT TOP 1 DepartmentID
FROM HumanResources.EmployeeDepartmentHistory V
WHERE V.StartDate >= PayHistory.RateChangeDate
AND V.EmployeeID = PayHistory.EmployeeID
ORDER BY V.StartDate),
ChangeDate = RateChangeDate
FROM HumanResources.EmployeePayHistory PayHistory
UNION ALL
SELECT
EmployeeID = EmployeeID,
OldValueOfRate = (SELECT TOP 1 Rate
FROM HumanResources.EmployeePayHistory V
WHERE V.RateChangeDate >= DepHistory.StartDate
AND V.EmployeeID = DepHistory.EmployeeID
ORDER BY V.RateChangeDate),
OldValueOfDep = DepHistory.DepartmentID,
ChangeDate = StartDate
FROM HumanResources.EmployeeDepartmentHistory DepHistory
) Events
)
SELECT DISTINCT
EmployeeID = EmployeeID,
ChangeDate = ChangeDate,
OldValueOfDep = COALESCE (
OldValueOfDep,
(SELECT DepartmentID
FROM HumanResources.EmployeeDepartmentHistory DepHistory
WHERE EndDate IS NULL
AND DepHistory.EmployeeID = AllEvents.EmployeeID)),
OldValueOfRate = COALESCE (
OldValueOfRate,
(SELECT TOP 1 Rate
FROM HumanResources.EmployeePayHistory PayHistory
ORDER BY RateChangeDate))
FROM
AllEvents
ORDER By EmployeeID, ChangeDate
The code can be easily checked filtering for a single EmployeeID (4 is a good candidate for this check). Please note that - in this case - the "current value" is kept in the history table and so the final COALESCE is a bit more intricated but still pretty easy both to write and to understand.
If you have a better pattern for this kind of situation or any comments on it... I'll be glad to read your comments.