A 9x9 sudoku solver and generator


Introduction

sudoku(1) solves and generates 9x9 sudoku puzzles. It has a command line interface with text and static postscript output. Look elsewhere for GUI solvers.


Constraints

The solver uses depth first and/or breadth first tree search with constraint propagation to prune the search for the next best move (forms of forward checking.) There are space/time tradeoffs between depth/breadth first search and the constraints used; sudoku(1) has options to control the combinations. The common characteristic for all constraints, here and elsewhere, is that they avoid trial and error. Its fine for a computer to guess and backtrack but a definite breach of puzzle manners to require a human to do so.

The constraints are:

F
Forced cell: only one value possible.
N
Only cell: only one value in row/col/box.
B
Box claim: only value in row/col within a box.
T
Tuple: N exact or hidden N-tuples in row/col/box.
X
Singleton cycle: one or more weak edges (x-wing/swordfish.) Requires B.
Y
Pair cycle: one optional non-pair vertex (xy-wing/coloring.)
W
Row/col claim: classic N-row/col x-wing/swordfish.

The constraint set is the set of constraints applied to solve a puzzle. All constraint sets contain at least F. The FN constraint set is common to most machine and human solvers.

Clue cells are the givens or initial known cell values. Solution cells are the non-clue cells that must be filled in to produce a puzzle solution. A candidate value is one of many possible values for a solution cell. Candidate values are eliminated as solution cells become filled in. A solution cell is pinned or forced when all but one of its candidate values is eliminated. A puzzle is solved when all solution cells are pinned.

Constraint propagation is the process of applying constraints to either pin solution cell values or eliminate solution cell candidate values. The solver propagates constraints in the left to right order FNBTXYW. When a constraint makes progress (by pinning or elimination) the constraints are reapplied starting with F. This means that a constraint is not attempted until all constraints before it fail to make progress.

A valid puzzle has exactly one solution. A minimal puzzle has a minimal number of clues. Removing any clue in a minimal puzzle results in an invalid puzzle with multiple solutions. A constrained puzzle is solved by constraint propagation. An unconstrained puzzle requires trial and error (guessing) to produce a solution. A proper puzzle is valid and constrained. Almost all newspaper puzzles (save typos) are proper.

The X and Y constraints are cycle based generalizations of x-wing, swordfish, xy-wing and coloring techniques. The XYW constraints are detailed in the next sections. See Angus Johnson and Paul Stephens for details on the FNBTW constraints.


X Constraints

X constraints are cycles composed of sequences of strong and weak edges between cells with the same candidate value within a single row/col/box. The vertices of a strong edge are the only cells with the candidate value within the row/col/box containing the edge. If a strong or weak edge vertex is assigned the candidate value then the other vertex in the edge is not assigned the candidate value. The converse holds for strong edges only: if a strong edge vertex is not assigned the candidate value then the other vertex is assigned the candidate value.

There are three X cycle forms based on strong/weak sequence combinations within the cycle:

  1. Odd strong edge sequences connected by one weak edge. Weak edge cells not participating in the cycle may be eliminated.
  2. One even strong edge sequence and zero or more odd strong edge sequences connected by one weak edge. The end cells of the even strong sequence may be eliminated.
  3. One odd strong edge sequence connected by two weak edges. The middle cell of the weak sequence may be eliminated.

X cycles handle x-wings and many, but not all, N-row/col swordfish configurations. The W constraint covers the swordfish configurations not handled by the X constraint.

In the examples that follow, the initial puzzle is listed as an 81 character string in row order. Puzzle image strong X edges are red, strong Y edges are purple, weak edges are green, and eliminated cell values are blue. The PNG image files were generated by first determining the cycles of interest:

     sudoku -v2 example.dat | grep x-
then generating the PostScript for cycle N:
     sudoku -ehp -kCN example.dat > example.ps
and finally converting the PostScript to PNG using the ImageMagick convert(1) command:
     convert -crop 490x490+60+44 -scale 55% example.ps example.png

This puzzle contains an X cycle type 1 on candidate 2, type 2 on candidate 2, and type 2 on candidate 7:

     "puzzle x-1"
     
1.34..78.4...8.1.3789123456.143..8...75.18234.38.4.61.3.18.456.54..319.88.7..5341
x-1-1 x-1-2 x-1-3

This puzzle contains an X cycle type 1 on candidate 2, type 2 on candidate 3, and type 3 on candidate 3:

     "puzzle x-2"
     
...4.768.456.8927.78..6.....476915.8.98...7.66..7.8.9..6.9738..8.45169.797.824.6.
x-2-5 x-2-7 x-2-6


Y Constraints

Y constraints are cycles composed of edges between cells with exactly two candidate values within a single row/col/box, where at least one of the candidate values is the same in both cells (strong edges.) Two of the Y cycle edges may meet at a cell that contains more than two candidate values, but has the same candidate as the other endpoints of the edges (two weak edges.) The candidate value between the weak edges may be eliminated. Check here for a description of the mathematics behind X and Y cycles. This puzzle contains Y cycles on candidates 7 and 1.

     "puzzle y-1"
     
123..67.9456..923.789..2.6..1...7.9..679...1.93..1..7.341..59..578.9...66928..3..
y-1-1 y-1-2

This puzzle contains Y cycles on candidates 3 and 5. The second one has 21 edges, arguably beyond human perseverence.

     "puzzle y-2"
     
4..21..6.....7..82.52.6....5.16.97.467...2.953...578.6....4.253.3..2.94824..93671
y-2-3 y-2-14


W Constraints

The W constraints handle non-cyclic swordfish configurations not covered by the cyclic X constraints.

This puzzle contains a 3 row swordfish on candidate 7:

     "puzzle w-1"
     
.2.45..8..5.....23.8912.564215894376..82...5.9...15842..25.169859....217861972435
w-1-1


Constrained Puzzles and Magic Cells

A magic cell set is the smallest set of solution cells with the smallest number of candidate values that produces a constrained puzzle. (Magic cell was coined by Vegard Hanssen in a private communication. The smallest number of candidate values qualification makes magic cells more human friendly, e.g., it limits the number of guesses when magic cell sets are provided as hints for solving unconstrained puzzles.) A puzzle with M magic cells for constraint set C is C-M-constrained. A C-0-constrained puzzle is C-constrained. If C is FN then it may be dropped from the notation. For example, constrained, 0-constrained, and FN-0-constrained are equivalent, and an FNW-constrained puzzle requires an x-wing or swordfish to solve without trial and error.

The canonical solution is a puzzle solution transformed to have the smallest lexicographic catenation of row column order cell values with the top left box labeled 123/456/789. The canonical magic cell set is the smallest set of solution cells with the smallest number of candidate values and the smallest row column order position within the canonical solution, that produces a constrained puzzle. Each puzzle has exactly one canonical magic cell set for a given constraint set. Magic cells listed by sudoku(1) are canonical.

This FNBTXYW-1-constrained puzzle has canonical magic cell [1,8] and a total of 2 magic cells:

     "puzzle m-2"
     
1....6..9....89.3...9..25...7....49.6..89.2..9.1.....3.9..4.....1..689748.497.3..
m-2

This FNBTXYW-1-constrained puzzle has canonical magic cell [1,1] and all solution cells are magic, for a total of 54 magic cells:

     "puzzle m-54"
     
.23.567.....7.....7.9.3...4..8...3..6....2.....48....1.....78..5.19....78...456..
m-54

This is one of two known FNBTXYW-2-constrained puzzle (from the ~225M survey in the next section.) It has canonical magic cells [1,6][9,7]:

     "puzzle m-2-2"
     
.2....6.....1.....7...3..5...8.4.9.33.......8....2.4.663.5.......5..9..2.1..84...
m-2-2

The other was scraped from the sudoku forums:

     "puzzle m-2-3"
     
7.....4...2..7..8...3..8..9...5..3...6..2..9...1..7..6...3..9...3..4..6...9..1..5
m-2-3


Observations

Puzzles can be classified by the constrained property (the magic cell count.) Magic cells were computed for ~225M <= 30 clue minimal puzzles generated over 2 days by sudoku(1), ~300K puzzles provided by Vegard Hanssen, ~28K puzzles scraped from the sudoku forum, and 17 clue (the conjectured minimum) puzzles posted by Gordon Royle.

The puzzles were collated by equivalence class and only one member of each class was included in the survey, resulting in 225,116,397 puzzles. The clue distribution is:

24622    17 clues
1    18 clues
30    19 clues
2150    20 clues
78956    21 clues
1422324    22 clues
12041179    23 clues
43603113    24 clues
73718643    25 clues
61850853    26 clues
26361499    27 clues
5529490    28 clues
460016    29 clues
23080    30 clues
201    31 clues
73    32 clues
32    33 clues
20    34 clues
14    35 clues
69    36 clues
6    37 clues
1    38 clues
12    39 clues
3    40 clues
1    42 clues
1    43 clues
1    45 clues
1    47 clues
1    48 clues
1    50 clues
1    51 clues
1    52 clues
1    57 clues
1    70 clues
The 17 clue counts are skewed because of Gordon Royle's search for minimal clue puzzles. The bulk of the surveyed puzzles were machine generated via pseudo random seeding and this might contribute to further skewing. A puzzle generator has to work hard to produce puzzles with less than 20 clues.

The constrained (magic cell) counts are:

  FN constrained counts
153,495,385    constrained
71,554,145    1-constrained
66,865    2-constrained

  FNBTXYW constrained counts
211,377,110    constrained
13,739,284    1-constrained
1    2-constrained

There are a few surprising results. Even with the basic FN constraints, no 3-constrained or greater puzzle was found. One might expect the 17 clue puzzles to be more likely 2-constrained, but they have a distribution similar to the random puzzles:

  17 clue FN constrained counts
11582    constrained
12728    1-constrained
310    2-constrained

  17 clue FNBTXYW constrained counts
23903    constrained
717    1-constrained
0    2-constrained

Open question: Are there any 3-constrained 9x9 sudoku puzzles? Answer: yes. On 2007-04-07 JPF posted this puzzle on the Player's Forum with singles backdoor size 3:

     100000089000009002000000450007600000030040000900002005004070000500008010060300000
There are 3 other known singles backdoor size 3 puzzles as of 2007-04-12.

The apparent rarity of 2-constrained puzzles may have implications on tree search strategies. If a puzzle is not constrained then it is is most likely 1-constrained, and a breadth first search of all candidate values on the solution cells should produce a magic cell. Any depth-first tree search that visits more than number of solution cells X number of candidate values nodes might have been better off using breadth first search for a magic cell. If there are no 3-constrained puzzles then a 2 level breadth first search would be sufficient to solve all puzzles. In general breadth first loses to depth first on 2-constrained puzzles.

Magic cells may provide hints to discovering the next "logical" constraint. If a solver were able to determine the magic cells then it could avoid backtracking altogether. Magic cells may also be used as hints for unconstrained puzzles, e.g., determine the value for cell [r,c] and the puzzle will be constrained.


Examples and timings

The top95 and contest puzzle sets are frequently referenced on the sudoku forum. The counts for these puzzle sets are:

  top95 FN
0    constrained
44    1-constrained
51    2-constrained

  top95 FNBTXYW
42    constrained
53    1-constrained
0    2-constrained

  contest FN
222    constrained
777    1-constrained
12    2-constrained

  contest FNBTXYW
543    constrained
468    1-constrained
0    2-constrained

Command lines and timings for sudoku(1) compiled with gcc -O3 -march=pentium4 on linux Intel(R) Pentium(R) 4 CPU 2.80GHz follow. The constraints selected by the -q option performed the best for each particular puzzle set.

     
     sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t contest.dat
     puzzles=1011 guesses=5608 iterations=29045 seconds=0.059990
     
     sudoku -d -qFNB -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t top95.dat
     puzzles=95 guesses=2231 iterations=8626 seconds=0.039993
     
     sudoku -a -qF -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t multiple-1.tst
     puzzles=957263 guesses=2013617 iterations=5283315 seconds=2.732584
     
     sudoku -a -qF -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t multiple-2.tst
     puzzles=50044975 guesses=117985466 iterations=307473894 seconds=161.486451
     
     sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-1-con-sym.dat   
     puzzles=1183 guesses=2991 iterations=19940 seconds=0.033994
     
     sudoku -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-1-con-sym.dat   
     puzzles=1183 guesses=2350 iterations=21699 seconds=0.038994
     
     sudoku -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-2-con.dat
     puzzles=28948 guesses=680294 iterations=2869221 seconds=9.260592
     
     sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-2-con.dat
     puzzles=28948 guesses=389196 iterations=1468420 seconds=3.768427
     
     sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FNBTXYW-1-con.dat
     puzzles=100000 guesses=418449 iterations=2322057 seconds=5.084235
     


Conclusions

There still seems to be more questions than conclusions about 9x9 sudoku puzzles. A quick scan of the timing results shows that the best times occur when the constraint set is restricted to F or FN and the solver is forced to backtrack due to the weaker constraints. If additional constraints slow down the solver then why bother implementing them at all? Well, coding a fast solver is a great pursuit, but it offers little to the human solver. Pencil and paper limit human solving techniques. Most aficionados draw the line at trial and error (basically the definition of backtracking), and that's where the additional constraints come into play. It is strange, though, that constraint techniques intended to aid human solvers (to make it a pastime rather than a chore) sometimes slow down machine solvers. Sometimes its not the best tactic to make a machine act like a human.


Downloads

Unless otherwise noted, source and executables are covered by the Common Public License.

The LGPL Licensed SudokuExplainer.jar, by Nicolas Juillerat, contains an additional command line entry point diuf.sudoku.test.serate. See serate(1) for more information. Note that some browsers may (improperly) change the .jar suffix to .zip on download; change the suffix back to .jar before using.

executables
linux i386 executable
linux pentium4 optimized executable
windows executable
windows executable visual C dll
regression test script
expected regression test results
minlex grid compressor linux i386 executable
minlex grid compressor linux pentium4 optimized executable
minlex grid compressor windows executable
all 2097068 grids in bands 300 through 416
LGPL Licensed Sudoku Explainer with ER and EP ratings
LGPL Licensed Sudoku Explainer serate(1) source changes

puzzles
sudoku forum multiple solution benchmark #1 (1 => 957263)
sudoku forum multiple solution benchmark #2 (1 => 50044975)
sudoku forum top95 (95)
sudoku forum contest (1011)
FN-1-constrained rotational symmetric puzzles (1183)
FN-2-constrained minimal mutually non-isomorphic puzzles (28948)
FNBTXYW-1-constrained rotational symmetric puzzles (141)
FNBTXYW-1-constrained puzzle sample (100000)
FNBTXYW-2-constrained minimal mutually non-isomorphic puzzles (1)
old FNBTXZ-2-constrained, only 1 FNBTXYW-2-constrained (1453)
constraint method taxonomy puzzles (126)
easy sudocoup generated order 25 sudoku (20)
easy sudocoup generated order 25 sudoku (20)


Glenn Fowler
Information and Software Systems Research
AT&T Labs Research
Florham Park NJ
September 17, 2008