Su Doku FAQ

Su Doku or Sudoku?
How does the solver work?
Where is the source code?
How should I run the solver if I don't have a permanent Internet connection?
Who should I contact?
Why don't you use the solver to enter the daily competition in The Times?
What if the Su Doku problem has more than one solution?
How do I compose a Su Doku problem?
Has anyone else written a computer-based solver?
How is the complexity figure calculated?
How few cells could appear on a valid Su Doku puzzle?
Does the extra memory management required by the more complicated strategy types slow down the solver?
Are random strategy types slower than their deterministic equivalents?
Which command-line apps are available?
Would performance be improved if native code were used in place of Java?
Does the Least Candidates Hybrid strategy have an Achilles' Heel?

Su Doku or Sudoku?

The Times prefers Su Doku whereas the rest of the world seems to have settled upon Sudoku. A forum user has kindly provided the following definitive judgement:

The Sudoku puzzle is also called "number place". The name Sudoku is used in the puzzle magazine Nikori, and is said to be an abbreviation for Suji wa dokushinsha ni kagiru ("numeral(s) limited to a single person"). So doku is an abbreviation for dokushinsha (single person or bachelor), probably referring to the fact that only one of each digit can appear in a row, column or box. A better translation might therefore be "Single Number", "Unique Digits" or something similar.

Further info

How does the solver work?

Since Release 1.2, the solver has offered several strategies, each of which has been based upon the 'Ariadne's Thread' technique. (In classical mythology, Theseus used the thread given by Ariadne in order to navigate the maze, constructed by Daedalus for King Minos of Crete, that housed the Minotaur - see Ovid's Metamorphoses, Book VIII). When the solver finds itself stuck after a sequence of moves, it will retract moves (i.e. it will regather the thread) until it finds an alternative untried route.

First Available

The sole strategy available in Releases 1.0 and 1.1, First Available simply places a '1' the next empty square (it proceeds from left-to-right and from top-to-bottom) and checks to see whether any row, column or sub-grid now contains a duplicate entry. When there is no duplication, the solver stores the location of the current square (i.e. it lays down the thread) and moves on to the next vacant square. When a duplicate has been found, the solver replaces the '1' with a '2', and so on, until the duplication has been resolved. Of course, sometimes no number from '1' to '9' will resolve the problem, in which case the solver deduces that a mistake must have been made at an earlier square. Accordingly, it will leave the current square empty and move back to the previous square (i.e. it regathers the thread), where it adds one to the current value and attempts to proceed again. Eventually, the algorithm is certain to complete the grid, so long as a solution exists.

Least Candidates Cell

Introduced for Release 1.2, Least Candidates Cell calculates the number of valid candidate numbers for each cell in the grid and populates the cell for which this number is lowest. When several cells have an equally small number of candidates, the first such cell is chosen. The strategy solves complicated Su Doku puzzles significantly faster than First Available.

Random Least Candidates Cell

Introduced in Release 1.2, Random Least Candidates Cell is similar to Least Candidates Cell except that when several cells have an equally small number of candidates, one is chosen randomly.

Least Candidates Number

Introduced for Release 1.2, Least Candidates Number counts for each figure one-to-nine (assume the classic Su Doku puzzle grid) the number of valid candidate positions in each row, column and subgrid. (A subgrid is one of the nine smaller nine-cell grids that constitute the larger grid). It selects the figure and row, column or subgrid for which the number of candidate positions is lowest. When several figures have an equally small number of candidate positions, the first such figure is chosen.

Random Least Candidates Number

Introduced in Release 1.2, Random Least Candidates Number is similar to Least Candidates Number except that when several figures have an equally small number of candidate positions, one is chosen randomly.

Least Candidates Hybrid

Introduced in Release 1.3, Least Candidates Hybrid calculates the number of candidates generated by the Least Candidates Cell and Least Candidates Number strategy types then proceeds with the smaller candidate set. When the candidate set has more than one member, the first element will be chosen.

Random Least Candidates Hybrid

Introduced in Release 1.3, Random Least Candidates Hybrid is similar to Least Candidates Hybrid except that when the candidate set has more than one member, an element will be chosen randomly.

Least Candidates Hybrid II

Introduced in Release 1.4, Least Candidates Hybrid II takes the candidates generated by Least Candidates Hybrid and filters them according to how few clues are available in that particular part of the grid. The candidate that will contribute most new information about a little-known area will be chosen. As ever, when the candidate set has more than one member, the first element will be chosen.

Random Least Candidates Hybrid II

Introduced in Release 1.4, Random Least Candidates Hybrid II is similar to Least Candidates Hybrid II except that when the candidate set has more than one member, an element will be chosen randomly. Since so many filters are applied by Least Candidates Hybrid II, it is fairly rare for there to be more than one candidate in the set.

Where is the source code?

The Java source code has been released at Sourceforge under the GNU General Public License. The binaries are available from the same address. The javadoc documentation is available here.

How should I run the solver if I don't have a permanent Internet connection?

Download the binary file sudoku_binary_R1_x.jar from Sourceforge to a new directory and, from that directory, type java -cp sudoku_binary_R1_x.jar com.act365.sudoku.SuDoku.

Who should I contact?

E-mail should be addressed to rubylips@act365.com.

Why don't you use the solver to enter the daily competition in The Times?

I don't like champagne. Had they offered a pint of beer as a prize, I might have been tempted.

What if the Su Doku problem has more than one solution?

The solver will display the first solution it finds. The Evaluate button indicates whether a puzzle has multiple solutions.

How do I compose a Su Doku problem?

Use the Compose button in order to generate all but the most challenging puzzles. I will shortly write an article on the topic of how to compose highly complex Su Doku puzzles using the command-line version of Composer. In the meantime, read the very interesting article here.

Has anyone else written a computer-based solver?

Yes - David Ireland at DI Management has written a VBA solver for Excel. Pete Wake has written a JavaScript solver.

How is the complexity figure calculated?

It is the number of times the thread had to be unwound in order to iterate through all possible puzzle solutions. The figure is, of course, strategy-dependent. A measure of a good strategy is that it leads to a low complexity figure. 'Fiendish' puzzles tend to have a complexity figure of 150 according to the Least Candidates Hybrid strategy, whereas the 20 and 21 cell puzzles might have complexity figures of around 500. The FirstAvailable strategy will often assign a complexity figure in excess of 100000 to such puzzles.

I've noticed that the Random Least Candidates Hybrid strategy tends to return (as it's random, it often returns a slightly different complexity figure each time it's evaluated) a lower complexity figure that the Least Candidates Hybrid strategy. I'm not sure exactly why this should be and I intend to investigate the matter. I think it's because, when all things are equal, Least Candidates Hybrid will tend to work at the top-left hand corner of the grid, while Random Least Candidates Hybrid works all over the grid. I suppose that the truest complexity figure would be the mean of all the different Random Least Candidates Hybrid figures, though, as the complexity figure is only a rough guide to how difficult a human is likely to find a puzzle, I'm unlikely ever to perform this calculation.

How few filled cells could appear on a valid Su Doku puzzle?

I aim to produce a definitive answer to this question shortly - the answers are at most 8 for 3x2 Su Doku, 19 for 3x3 Su Doku and 42 for 3x4 Su Doku. (See the Su Doku Brain of Britain page for examples of such puzzles.) I haven't constructed a mathematical proof in order to demonstrate these values - it's just that, as yet, I haven't been able to build puzzles with fewer initially-filled cells.

Does the extra memory management required by the more complicated strategy types slow down the solver?
Are random strategy types slower than their deterministic equivalents?

Colin Horne poses these interesting questions at the Wrox P2P website. The answers are no.

Memory management involves two steps:

The initial memory claim.
The copy of state information to and from the thread at each step.

The Least Candidates strategy types store a lot more state information at each step than the First Available strategy - for instance, the Least Candidates Hybrid strategy stores 3s³ (where s is the size of the grid - 9 for classical Su Doku) integers of state information at each step, whereas First Available doesn't store any at all (other than the current cursor position and the numbers entered on the grid). However, the additional memory requirements hardly affect performance because a single sufficiently large block of memory is claimed as the solver starts. Of course, this takes some time, but no further memory has to be claimed as the solver progresses. It is clear that the time saved by the use of a more selective strategy far outweighs the time spent in memory management. (The complexity figures quoted above suggest that, for the most complicated puzzles, the Least Candidates Hybrid strategy requires roughly 200 times less steps on its tree than First Available in order to solve the most difficult puzzles. On my PC, that translates to a fraction of a second for Least Candidates Hybrid against around ten seconds for First Available).

New memory is claimed only if strictly necessary - so, for instance, if a LeastCandidatesHybrid object is used to solve two puzzles of the same size in succession (as often happens in Composer), the memory claimed for the first puzzle is reused for the second - there is no new memory claim.

The profiler suggests the the costliest action performed by the Least Candidates Hybrid strategy is to copy the state information to and from the thread after each move. It takes relatively little time to select the candidates. Clearly, it takes a little more time to make a random selection (because the random number generator has to be invoked) but this extra time is insignificant. It is critical to notice that the amout of state information that has to be saved for a random or deterministic variant of a Least Candidates strategy is the same. Although, as Colin points out, with a non-random strategy it is possible to determine the previous move without reference to state information, the previous move represents only a tiny portion of the state information saved at each step. The bulk of the state information comprises the grids that store the possible candidate values for each cell. The information is saved because it is quicker to write it to and read it from memory than it is to recalculate it. These grids are saved regardless of whether the strategy is random or deterministic, so there is only a minute performance difference between the two strategy types.

In fact, as I write in answer to an earlier question, the random types consistently return lower complexity figures, possibly because they tend to insert values all over the grid rather than concentrate upon the top-left hand corner. I will write again once I have addressed this matter to my satisfaction. Of course, it is a mathematical/logical issue rather than an implementation issue, but it means that in most cases, a random strategy will solve a complicated puzzle more quickly than its deterministic equivalent.

A new note following Release 1.5:

Release 1.5 enhances performance significantly now that it only writes state to the thread if there is some uncertainty regarding which move should be made, i.e. if it impossible to determine whether the next move is indisputably correct. (In terms of the maze on Crete, this means that instead of a continual thread, we simply leave a marker whenever we reach a junction. Why didn't Ariadne think of that?) Given this change, a random method does incur a significant performance penalty compared to a deterministic method, so Least Candidates Hybrid has been made the solver strategy used within the Composer.

Which command-line apps are available?

The following command-line apps are available:

SuDoku, an app similar to the familiar Su Doku applet.
Solver, which solves puzzles entered from standard input.
Composer, which composes puzzles.
MaskFactory, which cycles through available masks.

In order to invoke a command-line app, download the main project binary sudoku_source_R1_x.jar from Sourceforge to a new directory and, from that directory, type, e.g., java -cp sudoku_binary_R1_x.jar com.act365.sudoku.MaskFactory.

Would performance be improved if native code were used in place of Java?

Surprisingly not, it would appear. The Su Doku Solver is suite is written in Java, which, as an interpreted language, one would expect to run more slowly than compiled code written in, say, C++. However, the HotSpot Virtual Machine, which ships with the more recent versions of Java, is able to perform dynamic optimization and will compile commonly-called routines on-the-fly. In order to discover whether performance improvements could be found, I rewrote the key solver routines in C++. To my great surprise, the native code ran more slowly. My belief is that I incurred a performance penalty because I used the STL vector<> class in place of the primitive multi-dimensional arrays used in Java and I am sure that I could improve the performance of the native code if I were to tweak the critical loops - however, the key point is that the Java code must execute at somewhere near to an optimal speed and that the inclusion of a mandatory native library in the distribution would not bring a sufficient benefit in terms of increased performance to outweight the inconvenience caused by the loss of cross-platform compatibility. (The C++ source code and a Win32 version of the native library appear in the distribution for reference purposes but their its is not recommended).

Does the Least Candidates Hybrid strategy have an Achilles Heel?

Yes, and it's highlighted by the following 4x4 grid, which is underdetermined and therefore has multiple solutions:

 .. .. .. .. * 10 .. ..  6 *  9 ..  3 .. * .. .. ..  1 
 13 ..  2 .. * .. ..  8 .. * .. .. .. 14 * ..  4 .. .. 
 .. .. 11 .. * .. .. ..  5 * .. .. 12 .. * .. .. .. .. 
 .. ..  7 .. * .. .. .. .. * .. .. 16 .. * .. .. 15 .. 
****************************************************** 
 .. .. ..  1 * .. 14 .. .. *  2 .. 11 .. *  8 .. ..  3 
 .. .. .. .. * .. .. ..  4 * .. 13 .. .. * .. .. ..  6 
 .. .. .. .. * .. .. .. .. * .. 15 .. .. *  7 .. 16 .. 
 ..  9 .. 10 * .. .. 12 .. * .. ..  5 .. * .. .. .. .. 
****************************************************** 
  8 .. .. 15 * .. .. .. .. * ..  1 .. .. * 14 ..  7  9 
 .. 12 .. .. * .. .. 11 .. * .. .. .. .. *  2 .. .. .. 
  6 .. .. .. * .. .. 16 .. *  7 .. .. .. * .. .. ..  8 
  4 .. .. .. * ..  3 .. 13 * .. .. 10 .. *  5 .. .. .. 
****************************************************** 
 ..  5 .. .. * ..  7 .. .. * 14 .. .. .. * 12 10 .. .. 
 .. .. ..  3 * ..  1 .. .. * .. .. .. .. * .. 11 .. .. 
 .. .. .. .. * 15 .. .. .. * ..  4 .. .. * .. 13 .. .. 
 14 .. .. 16 * ..  9  2 .. *  6 .. ..  8 * .. .. .. ..

Amazingly, the Least Candidates Hybrid strategy takes nearly 25 seconds to find a single solution to the puzzle on my PC. By way of contrast, Least Candidates Hybrid II takes (a still significant) 0.4 seconds. It's because of these timing problems that 3x5 and 4x4 puzzles are so hard to compose.