THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Diversion: Sub-Second SQL Sudoku Solver

I enjoy Sudoku as a way to relax on occasion but, being an IT guy, I suppose I am predictably meta: solving them is fun, but making a machine to solve them is even more fun. I created a clunky solver once before based on the idea of using as few T-SQL statements as possible. This year I decided to try to make a fast solver instead. It’s working fairly well and solves most puzzles in under a second on my laptop. It helps I’ve been sick with a cold for the past two weeks, giving me the sort of “enforced relaxation” that drives me to puzzles.

Here’s the basic algorithm:

  1. Create a table for the puzzle solution, and insert the given values from the puzzle definition.
  2. Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This is a bit like the “notes” feature of an electronic Sudoku app, where you can indicate possible values for unsolved cells.
  3. Loop while there is no solution
    1. Using some list of solving rules, “push” solved values from the cells in the candidates table to the solution table.
    2. Use a process of elimination to remove candidate values that the Sudoku rules make impossible.
  4. End Loop

I started with that fundamental idea, but I found that the solver rules I could encode in T-SQL (at least with my level of math skill) covered most but not all puzzles. Some of the advanced ones would stall or reach an impasse where the available rules couldn’t make progress eliminating values. For those cases I added some branching logic to do a minimal level of trial and error. The adjusted algorithm became this more complicated version:

  1. Create a table for 1-n puzzle solutions with a branching identifier. Insert the given values from the puzzle as branch 1.
  2. Create a second table that contains all possible “candidate” values for all empty cells in the puzzle. This table also has a branch identifier, and the first set of candidates is also labeled as branch 1.
  3. Loop while there is no solution
    1. Using some list of solving rules, “push” solved values from any branches in the candidates table into the corresponding branch in the solution table
    2. If no cells are solved, then “branch” the solution by duplicating the solved values and the candidates into multiple versions/branches on some cell that has a minimal number of candidate values. Example: if we see no progress, and cell 3B might be a 2 or might be a 3, then make two versions, one using 2 and one using 3, and proceed with both.
    3. If a branch contains a “contradiction” – meaning an impossible combination of values – then it cannot be the solution to the puzzle, so remove it.
    4. Use a process of elimination to remove candidate values that the rules make impossible, from all branches.
  4. End Loop

I found that getting started with this felt complicated, but I was able to distill the code down gradually to a few statements. Admittedly, they are dense, and take some time to understand, but in all it didn’t end up being very much code at all.

Modeling the Puzzle

First, how to represent the puzzle space in SQL Server? A Sudoku puzzle looks like a table, but in fact it isn’t like a database table at all, because position is so important to the logic of the puzzle – in SQL the order of rows and columns is, by definition, undefined. For example, there’s no “last row” in a table. Further, in creating the solver I tried to stay in the spirit of relations and sets, and not resort to techniques like string manipulation or stuffing multiple values into the same column.

For me, it was simplest to think of the puzzle as several intersecting sets (“rows”, “columns,” “squares,” available digits) and make a table representing the sets instead of the physical layout of the Sudoku square.

The schema looks like this: a Solution table has columns for Value, Row, Column, and Square. Value is the digit 1-9 in the cells of the puzzle, and Row, Column and Square represent the position of the cell in the larger Sudoku square. Rows are numbered, columns assigned letters A-I and the squares are labeled S1-S9, reading left to right and top to bottom in the physical puzzle. Unfortunately, Row, Column and Square are all keywords in T-SQL, so I abbreviated them as Rw, Cl, Sq to keep the square brackets at bay in the code. Lastly, a Branch column is appended to allow the solver to make versions of the puzzle:

USE Sudoku
GO

CREATE TABLE dbo.Solution (
    Value tinyint NOT NULL,
    Rw tinyint NOT NULL,
    Cl char(1) NOT NULL,
    Sq  AS ( dbo.SquareFor(Rw,Cl) ),
    Branch int NOT NULL
) 

Because the square is implied by the row and column location in the puzzle, the Sq value is computed using a UDF, as:

CREATE FUNCTION dbo.SquareFor 
(
    @Rw tinyint, @Cl char(1)
)
RETURNS char(2)
WITH SCHEMABINDING
AS
BEGIN
    DECLARE @Result char(2)

    SELECT @Result = 
        CASE 
        WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'A' AND 'C' THEN 'S1'
        WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'D' AND 'F' THEN 'S2'
        WHEN @Rw BETWEEN 1 AND 3 AND @Cl BETWEEN 'G' AND 'I' THEN 'S3'
        WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'A' AND 'C' THEN 'S4'
        WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'D' AND 'F' THEN 'S5'
        WHEN @Rw BETWEEN 4 AND 6 AND @Cl BETWEEN 'G' AND 'I' THEN 'S6'
        WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'A' AND 'C' THEN 'S7'
        WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'D' AND 'F' THEN 'S8'
        WHEN @Rw BETWEEN 7 AND 9 AND @Cl BETWEEN 'G' AND 'I' THEN 'S9'
        END

    RETURN @Result

END

[These code samples are just to explain the solution and won’t run in the order presented, but all the code in a runnable script is attached to this post.]

The domains for allowed values, rows and columns are very useful to explicitly define, as we’ll see later, so I have four reference tables containing the possible digits, row, column and square labels, all following this pattern:

CREATE TABLE dbo.Cls(
    Label char(1) NOT NULL,
    CONSTRAINT PK_Cls PRIMARY KEY CLUSTERED ( Label ASC )
) ;

INSERT dbo.Cls ( Label ) VALUES ( 'A' );
INSERT dbo.Cls ( Label ) VALUES ( 'B' );
INSERT dbo.Cls ( Label ) VALUES ( 'C' );
INSERT dbo.Cls ( Label ) VALUES ( 'D' );
INSERT dbo.Cls ( Label ) VALUES ( 'E' );
INSERT dbo.Cls ( Label ) VALUES ( 'F' );
INSERT dbo.Cls ( Label ) VALUES ( 'G' );
INSERT dbo.Cls ( Label ) VALUES ( 'H' );
INSERT dbo.Cls ( Label ) VALUES ( 'I' );

Given those objects, the rules of the game can be encoded with some foreign keys and unique constraints. Only one of each digit allowed per row, column and square, and only those values defined in the domain tables:

ALTER TABLE dbo.Solution WITH CHECK ADD CONSTRAINT FK_Solution_Vals FOREIGN KEY ( Value )
REFERENCES dbo.Digits ( Value );

ALTER TABLE dbo.Solution  WITH CHECK ADD CONSTRAINT FK_Solution_Cols FOREIGN KEY( Cl )
REFERENCES dbo.Cls ( Label );

ALTER TABLE dbo.Solution  WITH CHECK ADD CONSTRAINT FK_Solution_Rows FOREIGN KEY( Rw )
REFERENCES dbo.Rws ( Label );
CREATE UNIQUE NONCLUSTERED INDEX UniqueClValue ON dbo.Solution 
(
    Value ASC,
    Cl ASC,
    Branch ASC
);

CREATE UNIQUE NONCLUSTERED INDEX UniqueRowValue ON dbo.Solution 
(
    Value ASC,
    Rw ASC,
    Branch ASC
);

CREATE UNIQUE NONCLUSTERED INDEX UniqueSqValue ON dbo.Solution 
(
    Value ASC,
    Sq ASC,
    Branch ASC
);

Note the addition of Branch in the indexes, so that we can store multiple versions of a puzzle, while still being constrained to the puzzle rules.

Now, how to populate the Solution table with a puzzle? In the code sample attached to this post, there are several example puzzles prebuilt, but this is the method. There’s a default on the Solutions table to make new inserts as Branch 1:

ALTER TABLE dbo.Solution ADD CONSTRAINT DF_Solution_Branch  DEFAULT (1) FOR Branch

That allows setting up a puzzle by entering just the values and row and column position:

INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 1, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 1, 'I' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 2, 'D' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 2, 'E' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 2, 'H' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 2, 'I' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 9, 3, 'B' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 3, 'D' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 5, 3, 'I' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 4, 'C' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 4, 'F' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 4, 'G' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 5, 'B' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 5, 'C' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 5, 'G' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 5, 'H' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 6, 'C' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 1, 6, 'D' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 6, 'G' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 3, 7, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 1, 7, 'F' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 7, 'H' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 6, 8, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 8, 8, 'B' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 9, 8, 'E' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 2, 8, 'F' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 4, 9, 'A' );
INSERT dbo.Solution ( Value, Rw, Cl ) VALUES ( 7, 9, 'I' );

It’s even faster to enter the values and positions using SSMS > Edit Top 200 (gasp!) on the Solutions table. I also found it helpful to back the puzzles up with SELECT INTO as I was working on the solver, because entering them is definitely tedious. In the code sample there are several example puzzles that can be loaded like this:

TRUNCATE TABLE dbo.Solution;

INSERT dbo.Solution ( Value, Rw, Cl ) 
SELECT Value, Rw, Cl 
FROM dbo.Sample5;

One last bit of setup: even though a relational table doesn’t directly model the Sudoku square, requiring this other method to model the puzzle, it’s incredibly helpful to be able to view the puzzle like a Sudoku square. In fact, I found working on this was impossible without rendering the puzzle in the paper layout. The paper display can be presented using a pivot:

CREATE VIEW dbo.DisplaySolution AS 
SELECT Branch, Rw,
    MIN( CASE  Cl WHEN 'A' THEN Value ELSE NULL END ) AS A,
    MIN( CASE  Cl WHEN 'B' THEN Value ELSE NULL END ) AS B,
    MIN( CASE  Cl WHEN 'C' THEN Value ELSE NULL END ) AS C,
    MIN( CASE  Cl WHEN 'D' THEN Value ELSE NULL END ) AS D,
    MIN( CASE  Cl WHEN 'E' THEN Value ELSE NULL END ) AS E,
    MIN( CASE  Cl WHEN 'F' THEN Value ELSE NULL END ) AS F,
    MIN( CASE  Cl WHEN 'G' THEN Value ELSE NULL END ) AS G,
    MIN( CASE  Cl WHEN 'H' THEN Value ELSE NULL END ) AS H,
    MIN( CASE  Cl WHEN 'I' THEN Value ELSE NULL END ) AS I
FROM dbo.Solution 
GROUP BY Branch, Rw
SELECT 
    A, B, C, D, E, F, G, H, I 
FROM dbo.DisplaySolution 
ORDER BY Rw;

Now we have all the setup required to make the solver work. Returning to the original algorithm above, we have a few discrete tasks that we can break out into procedures:

  1. Deriving a table of all the possible candidate values for empty cells in the solution
  2. Searching that candidates table for solved cells and pushing those into the solution
  3. Eliminating candidates
  4. Branching the puzzle into versions if we get stuck in a situation where the rules we have in #2 and #3 don’t work

Candidates

So, deriving all possible candidate values is fairly straightforward: if we cross join the rows, the columns and the digits 1-9, we get a derived table of every possibility for any puzzle. From that we can eliminate the candidate values from the cells that are already in our specific puzzle. We can also eliminate all the candidates that conflict with the values in the given puzzle, in the same row, column or square.

In order to make this manageable, I used a “bag” analogy. Imagine a bag full of tiles, like scrabble tiles, where there are 9 ones, 9 twos, 9 threes, etc. From that bag we know that 1 one will go in the first Sudoku square, 1 one into the second, one into the third, and so on. It’s possible to “label” the tiles ahead of time with the 9x9 square to which they will belong even though we don’t know which specific cell they will ultimately occupy.

So this code starts with a CTE called “bag” that has just this structure, made using a cross join of all digits and all squares. From the bag, we discard a bunch of the tiles for the already-occupied cells in the solution. Next, we take the remaining contents of the bag and use it to make a Candidates table, further discarding tiles that conflict with the already-solved cells in the solution by occupying the same row, column or square. It’s a great CTE exercise:

CREATE PROCEDURE dbo.MakeCandidates
AS
BEGIN

    -- Make a table containing all possible "candidate" values that could go in 
    -- empty cells in the puzzle
    
    IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Candidates]') AND type in (N'U'))
    DROP TABLE dbo.Candidates;

    WITH Bag AS (
            SELECT Digits.Value, Squares.Label AS Sq, 1 AS Branch
            FROM dbo.Digits cross join dbo.Squares 
            EXCEPT SELECT Value, Sq, Branch FROM dbo.Solution
        ),
        
        AllCells AS (
            SELECT Rws.Label AS Rw, 
                Cls.Label AS Cl,
                ( SELECT dbo.SquareFor( Rws.Label, Cls.Label ) ) AS Sq
            FROM Rws CROSS JOIN Cls
        ),
        
        Placements AS ( 
            SELECT Bag.Value, allCells.Rw, allCells.Cl, allCells.Sq 
            FROM AllCells JOIN Bag ON allCells.Sq = bag.Sq
        ) 
        
        SELECT p.Value, p.Rw, p.Cl, p.Sq, 1 AS Branch 
        INTO dbo.Candidates
        FROM placements p
        WHERE NOT EXISTS( SELECT 1 FROM dbo.Solution 
            WHERE Solution.Rw = p.Rw AND Solution.Cl = p.Cl
            )
        AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
            WHERE Solution.Value = p.Value AND Solution.Rw = p.Rw 
            )
        AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
            WHERE Solution.Value = p.Value AND Solution.Cl = p.Cl 
            )
        AND NOT EXISTS ( SELECT 1 FROM dbo.Solution
            WHERE Solution.Value = p.Value AND Solution.Sq = p.Sq 
            );
END

The result is that we have a Solution table with one version of the puzzle at its starting point, and a Candidates table that contains all the plausible values that might go into all empty cells in the puzzle. All the rows in both tables are identified as “Branch 1.” Going forward we’ll work back and forth between the two tables to solve the puzzle by process of elimination.

Solved Cells

The next chunk of the algorithm is to try to search the candidates for solved cells and push those into the solution. This code uses about four of the common Sudoku solving methods (those that I could sanely encode in T-SQL). Many sites describe them, such as http://www.sudoku129.com/puzzles/tips_intro.php or http://www.sudokuoftheday.com/pages/techniques-overview.php or your favorite search engine’s suggestions.

CREATE PROCEDURE dbo.PushSolvedCells ( @branch int, @count int OUTPUT )
AS

BEGIN

    -- Cells having exactly one possible value must be that value
    WITH solvedCells AS (
            SELECT Value, p.Rw, p.Cl, p.Branch
            FROM dbo.Candidates p
            INNER JOIN (
                SELECT Rw, Cl, Branch
                FROM dbo.Candidates 
                GROUP BY Rw, Cl, Branch
                HAVING COUNT(*) = 1
            ) AS onePossibleValue 
                ON p.Rw = onePossibleValue.Rw 
                AND p.Cl = onePossibleValue.Cl 
                AND p.Branch = onePossibleValue.Branch
                AND p.Branch = @branch
        ) ,
            
        -- Numbers that are possible in only one cell of a row must occupy that cell
        unqInRow AS (
            SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
            FROM dbo.Candidates p
            INNER JOIN (
                SELECT Value, Rw, Branch
                FROM dbo.Candidates 
                GROUP BY Value, Rw, Branch
                HAVING COUNT(*) = 1
            ) AS uniques 
                ON p.Rw = uniques.Rw 
                AND p.Value = uniques.Value
                AND p.Branch = uniques.Branch
                AND p.Branch = @branch
        ),

        -- Numbers that are possible in only one cell of a column must occupy that cell
        unqInCol AS (
            SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
            FROM dbo.Candidates p
            INNER JOIN (
                SELECT Value, Cl, Branch
                FROM dbo.Candidates 
                GROUP BY Value, Cl, Branch
                HAVING COUNT(*) = 1
            ) AS uniques
                ON p.Cl = uniques.Cl 
                AND p.Value = uniques.Value
                AND p.Branch = uniques.Branch
                AND p.Branch = @branch
        ),

        -- Numbers that are possible in only one cell of a square must occupy that cell
        unqInSquare AS (
            SELECT p.Value, p.Rw, p.Cl, p.Sq, p.Branch
            FROM dbo.Candidates p
            INNER JOIN (
                SELECT Value, Sq, Branch
                FROM dbo.Candidates 
                GROUP BY Value, Sq, Branch
                HAVING COUNT(*) = 1
            ) AS uniques 
                ON p.Sq = uniques.Sq 
                AND p.Value = uniques.Value
                AND p.Branch = uniques.Branch
                AND p.Branch = @branch
        ) 
        
        INSERT dbo.Solution( Value, Rw, Cl, Branch )
            SELECT Value, Rw, Cl, Branch FROM solvedCells
            UNION 
            SELECT Value, Rw, Cl, Branch FROM unqInRow
            UNION 
            SELECT Value, Rw, Cl, Branch FROM unqInCol
            UNION 
            SELECT Value, Rw, Cl, Branch FROM unqInSquare;
        
    SET @count = @@ROWCOUNT;
    
END

The procedure processes one branch at a time given by @branch, and outputs the number of cells that were solved with the rules in the code. The count will become important when we need to track “stalled” branches of the solution.

Elimination

The next chunk of the algorithm says, “OK, now that those cells’ values are known, what other candidates are eliminated.” This is sort of the other half of the solving logic – take the remaining candidates and try to eliminate as many as possible, with other solving techniques:

CREATE PROCEDURE dbo.EliminateCandidates as
BEGIN

    DELETE dbo.Candidates 
    WHERE EXISTS ( SELECT 1 FROM dbo.Solution s
        WHERE s.Cl = Candidates.Cl 
            AND s.Rw = Candidates.Rw 
            AND s.Branch = Candidates.Branch
        )
    OR EXISTS ( SELECT 1 FROM dbo.Solution s1
        WHERE s1.Value = Candidates.Value 
            AND s1.Rw = Candidates.Rw  
            AND s1.Branch = Candidates.Branch
        )
    OR EXISTS ( SELECT 1 FROM dbo.Solution s2
        WHERE s2.Value = Candidates.Value 
            AND s2.Cl = Candidates.Cl 
            AND s2.Branch = Candidates.Branch
        )
    OR EXISTS ( SELECT 1 FROM dbo.Solution s3
        WHERE s3.Value = Candidates.Value 
            AND s3.Sq = Candidates.Sq 
            AND s3.Branch = Candidates.Branch
        ); 

    -- Narrow rows by square: for each row, find values that can only be in one square
    -- and delete other candidates for those values in the same square

    WITH irs AS ( 
        SELECT DISTINCT Value, Rw, Sq, Branch
        FROM dbo.Candidates p
        WHERE ( SELECT COUNT ( DISTINCT Sq ) 
                FROM dbo.Candidates p2 
                WHERE p2.Rw = p.Rw 
                    AND p2.Value = p.Value 
                    AND p2.Branch = p.Branch ) = 1
        )
        DELETE dbo.Candidates 
        FROM dbo.Candidates p
        JOIN irs ON p.Sq = irs.Sq
            AND p.Value = irs.Value
            AND p.Branch = irs.Branch
            AND p.Rw != irs.Rw;
            
    -- Narrow columns by square: for each column, find values that can only be in one square
    -- and delete other candidates for those values in the same square

    WITH irs AS (        
        SELECT DISTINCT Value, Cl, Sq, Branch
        FROM dbo.Candidates p
        WHERE ( SELECT COUNT ( DISTINCT Sq ) 
            FROM dbo.Candidates p2 
            WHERE p2.Cl = p.Cl AND p2.Value = p.Value AND p2.Branch = p.Branch ) = 1
        )
        DELETE dbo.Candidates 
        FROM dbo.Candidates p
        JOIN irs ON p.Sq = irs.Sq
            AND p.Value = irs.Value
            AND p.Branch = irs.Branch
            AND p.Cl != irs.Cl;
END

Alternating these two procedures – locating solved cells and pushing them into the solution, and then eliminating other candidates – will solve most of the “easy” and some “intermediate” level puzzles in books or Sudoku apps. The next challenge is what to do when these rules are exhausted and can’t narrow the field of candidate values. A mathematician might be able to add more solving methods to the handful I have here and make the algorithm solve ANY puzzle. I’m not quite that smart, so for some advanced puzzles I had to introduce branching, or live with the fact that some would not be solved.

(Technically I did a lot of work implementing one more advanced method, which was pages and pages of code – but I found that it was never invoked in any sample puzzles I tried, perhaps because the logic was already implied by one of the other statements above.)

Branch

Branching presents a few challenges: first, how to know when to branch. Second, some branches are by definition invalid, so how to we identify and cull those? Third, simply, what are the logistics to make branches? How?

The answer I came up with to the first is just to count how many cells are solved in each pass through the solver. If we execute a pass and we don’t see any new values being inserted, then the branch(es) we have in play are obviously stalled. When they stall, make a new branch.

Invalid branches are interesting. They fail in one of two ways: either there is a cell that has NO candidates, which ultimately is caught by the check above, or there is a contradiction that surfaces where, for example, a digit appears to belong in two places. The contradictions violate the rules of the puzzle, and because we encoded the rules of the puzzle in a series of unique constraints they become constraint violations in SQL Server. They actually cause the script to throw an error where the implied solved cells can’t be inserted into the solution because they violate a unique index. I decided just to roll with that idea: branches work until they throw an error, at which point they can be discarded from the process.

So, when we run the PushSolvedCells procedure, and @count shows we didn’t solve any cells, then we branch. Branching works by identifying a cell in Candidates that has a minimal number of possible values (practically always 2). We then rank the possible values for that cell, insert the #1 possibility into the existing branch and use the #2 and higher in new copy(ies) of the solution and all the candidates:

CREATE PROCEDURE dbo.Branch AS
BEGIN

    IF OBJECT_ID( 'tempdb..#branchCells' ) IS NOT NULL DROP TABLE #branchCells;

    CREATE TABLE #branchCells ( Branch int, Value int, Rw int, Cl char(1) );

    WITH 
        BranchCell AS (
            -- Locate a cell with a small number of candidates 
            SELECT TOP ( 1 ) Rw, Cl, Branch 
            FROM dbo.Candidates
            GROUP BY Rw, Cl, Branch
            ORDER BY COUNT(*), Rw, Cl
        ) ,    
        LastBranch AS (
            SELECT MAX( branch ) AS id FROM dbo.Solution
        ) ,
        BranchVals as (
            -- Find the candidate values for that cell
            SELECT ( ROW_NUMBER() OVER ( order by c.value ) - 1 ) as branchOffset, 
                c.Value, c.Rw, c.Cl, c.Branch 
            FROM dbo.Candidates c
            INNER JOIN BranchCell ON c.Rw = BranchCell.Rw
                AND c.Cl = BranchCell.Cl AND c.Branch = BranchCell.Branch
        ) INSERT #branchCells ( Branch, Value, Rw, Cl )
            SELECT 
                -- Number the candidates by branch 
                CASE WHEN branchOffset = 0 THEN BranchVals.Branch 
                ELSE branchOffset + LastBranch.id END AS Branch,
                BranchVals.Value, 
                BranchVals.Rw, 
                BranchVals.Cl
            FROM BranchVals 
            CROSS APPLY LastBranch;

    -- For branch cells other than the first one, make a copy of all candidates
    WITH RankedBranches AS (
        SELECT ROW_NUMBER() OVER ( ORDER BY Branch ) rnk, Branch, Value, Rw, Cl
        FROM #branchCells
        ) INSERT INTO dbo.Candidates ( Branch, Value, Rw, Cl, Sq )
            SELECT r.Branch, 
                c.Value, 
                c.Rw, 
                c.Cl, 
                c.Sq 
            FROM RankedBranches r
            JOIN dbo.Candidates c ON c.Branch = ( SELECT TOP(1) Branch FROM rankedBranches )
                AND r.rnk > 1;
            
    -- For branch cells other than the first one, make a copy of the solution
    WITH RankedBranches AS (
        SELECT ROW_NUMBER() OVER ( ORDER BY Branch ) rnk, Branch, Value, Rw, Cl
        FROM #branchCells
        ) INSERT INTO dbo.Solution ( Branch, Value, Rw, Cl )
            SELECT r.Branch, 
                s.Value, 
                s.Rw, 
                s.Cl
            FROM RankedBranches r
            JOIN dbo.Solution s ON s.Branch = ( SELECT TOP(1) Branch FROM rankedBranches )
                AND r.rnk > 1;

    -- Delete the specific candidates from the copies of all candidates
    -- corresponding to the branching values
    
    DELETE dbo.Candidates
    FROM dbo.Candidates c
    JOIN #branchCells b 
    ON c.Branch = b.Branch
        AND c.Cl = b.Cl
        AND c.Rw = b.Rw
        AND c.Value != b.Value;

END

If you are still with me, wonderful! This is a bit of a marathon post, but we are near the end.

Solver

The last piece we need is just a “driver” script to loop through these procedures in the right order, with some flow control. The idea here follows the algorithm at the top of this post: run a loop, alternate inserting solved cells and removing candidates, until a solution surfaces. If the loop stalls, then branch, and if a branch fails, then delete it. The script is also decorated with statements to load and display the puzzle, to track progress, and a timer to see how long the solution took to produce:

USE Sudoku

-- Setup for a puzzle

TRUNCATE TABLE dbo.Solution;

-- Insert the values for a new puzzle here OR:

INSERT dbo.Solution ( Value, Rw, Cl ) 
SELECT Value, Rw, Cl 
FROM dbo.Sample5; -- < Change this sample table to try other puzzles

-- Display the unsolved puzzle

SELECT 
    A, B, C, D, E, F, G, H, I 
FROM dbo.DisplaySolution 
ORDER BY Rw;

-- End Setup

-- Solve the puzzle

SET NOCOUNT ON;

DECLARE @starttime datetime2, 
        @endtime datetime2

SET @starttime = SYSDATETIME();

PRINT 'Compute all candidate values'
EXEC dbo.MakeCandidates;

DECLARE @currBranch int = 0,
        @branchSolvedCells int = 0,
        @passSolvedCells int = 0,
        @puzzleSolved bit = 0;
        
WHILE 1 = 1 
BEGIN 

    -- Attempt to move solved cells' values from Candidates to Solution
    -- one branch at a time
    
    -- This variable tracks whether the solutions are making any progress
    -- and controls branching solutions that have "stalled"
    SET @passSolvedCells = 0;
    
    DECLARE branches CURSOR FAST_FORWARD
    FOR 
    SELECT DISTINCT Branch FROM dbo.Solution;
        
    OPEN branches;
    FETCH NEXT FROM branches INTO @currBranch;
    
    WHILE @@FETCH_STATUS = 0 
    BEGIN
        BEGIN TRY
        
            PRINT 'Insert solved cells branch ' + cast( @currBranch as varchar(5) );
            
            EXEC dbo.PushSolvedCells @branch = @currBranch, @count = @branchSolvedCells OUTPUT;
            
            IF ( ( SELECT COUNT(*) FROM dbo.Solution where Branch = @currBranch ) = 81 ) 
            BEGIN 
                SET @puzzleSolved = 1; 
                BREAK; -- Solved!
            END; 
            
        END TRY
        BEGIN CATCH
        
            IF ERROR_NUMBER() = 2601 BEGIN 
                
                -- Constraint violation means branch has a contradiction and cannot be the solution
                -- Remove invalid branch
                
                PRINT ERROR_MESSAGE()
                PRINT 'Constraint Violation, Purge Branch ' + cast( @currBranch as varchar(5) );            
                
                DELETE dbo.Solution WHERE Branch = @currBranch
                DELETE dbo.Candidates WHERE Branch = @currBranch
                
            END
            ELSE BEGIN 
                -- If the code is correct we should never hit this block
                RAISERROR( 'Unhandled error', 11, 1 );
                BREAK;
            END
        END CATCH
        
        SET @passSolvedCells += @branchSolvedCells;
        FETCH NEXT FROM branches INTO @currBranch;
        
    END
    
    CLOSE branches;
    DEALLOCATE branches;    
    
    IF ( @puzzleSolved = 1 ) BREAK; -- Solved in the last pass!
        
    IF @passSolvedCells = 0 
    BEGIN 
        -- Our rules of elimination didn't isolate any new values
        -- branch one of the solutions
        PRINT 'Stuck! Branch!' ;
        EXEC dbo.Branch ;
    END ;
    
    -- Remove candidate values from all branches using process of elimination
    PRINT 'Eliminate Candidate Values'
    EXEC dbo.EliminateCandidates ;
    
END ;

SET @endtime = SYSDATETIME();

-- Display the solution
SELECT 
    A, B, C, D, E, F, G, H, I 
FROM dbo.DisplaySolution 
WHERE Branch = ( SELECT Branch 
                    FROM dbo.Solution 
                    GROUP BY Branch 
                    HAVING COUNT(*) = 81 )
ORDER BY Rw ;

SELECT DATEDIFF( millisecond, @starttime, @endtime ) AS Duration;

Working sample code for the database and the solver script will be attached to this post if you want to try this out. It was authored on SQL Server 2008 R2. I’d welcome any and all observations, suggestions, improvements! Just leave notes below. Enjoy!

Published Sunday, January 06, 2013 12:46 PM by merrillaldrich

Attachment(s): SubSecondSudoku.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement