A Sudoku puzzle is minimal (also called irreducible) if it has a unique solution, but removing any digit would yield a puzzle with multiple solutions. In other words, every digit is necessary to determine the solution.
I have a basic algorithm to generate minimal Sudokus:
- Generate a completed puzzle.
- Visit each cell in a random order. For each visited cell:
- Tentatively remove its digit
- Solve the puzzle twice using a recursive backtracking algorithm. One solver tries the digits 1-9 in forward order, the other in reverse order. In a sense, the solvers are traversing a search tree containing all possible configurations, but from opposite ends. This means that the two solutions will match iff the puzzle has a unique solution.
- If the puzzle has a unique solution, remove the digit permanently; otherwise, put it back in.
This method is guaranteed to produce a minimal puzzle, but it's quite slow (100 ms on my computer, several seconds on a smartphone). I would like to reduce the number of solves, but all the obvious ways I can think of are incorrect. For example:
- Adding digits instead of removing them. The advantage of this is that since minimal puzzles require at least 17 filled digits, the first 17 digits are guaranteed to not have a unique solution, reducing the amount of solving. Unfortunately, because the cells are visited in a random order, many unnecessary digits will be added before the one important digit that "locks down" a unique solution. For instance, if the first 9 cells added are all in the same column, there's a great deal of redundant information there.
- If no other digit can replace the current one, keep it in and do not solve the puzzle. Because checking if a placement is legal is thousands of times faster than solving the puzzle twice, this could be a huge time-saver. However, just because there's no other legal digit now doesn't mean there won't be later, once we remove other digits.
- Since we generated the original solution, solve only once for each cell and see if it matches the original. This doesn't work because the original solution could be anywhere within the search tree of possible solutions. For example, if the original solution is near the "left" side of the tree, and we start searching from the left, we will miss solutions on the right side of the tree.
I would also like to optimize the solving algorithm itself. The hard part is determining if a solution is unique. I can make micro-optimizations like creating a bitmask of legal placements for each cell, as described in this wonderful post. However, more advanced algorithms like Dancing Links or simulated annealing are not designed to determine uniqueness, but just to find any solution.
How can I optimize my minimal Sudoku generator?