Generating minimal/irreducible Sudokus

2019-02-13 09:17发布

问题:

A Sudoku puzzle is minimal (also called irreducible) if it has a unique solution, but removing any digit would yield a puzzle with multiple solutions. In other words, every digit is necessary to determine the solution.

I have a basic algorithm to generate minimal Sudokus:

  • Generate a completed puzzle.
  • Visit each cell in a random order. For each visited cell:
    • Tentatively remove its digit
    • Solve the puzzle twice using a recursive backtracking algorithm. One solver tries the digits 1-9 in forward order, the other in reverse order. In a sense, the solvers are traversing a search tree containing all possible configurations, but from opposite ends. This means that the two solutions will match iff the puzzle has a unique solution.
    • If the puzzle has a unique solution, remove the digit permanently; otherwise, put it back in.

This method is guaranteed to produce a minimal puzzle, but it's quite slow (100 ms on my computer, several seconds on a smartphone). I would like to reduce the number of solves, but all the obvious ways I can think of are incorrect. For example:

  • Adding digits instead of removing them. The advantage of this is that since minimal puzzles require at least 17 filled digits, the first 17 digits are guaranteed to not have a unique solution, reducing the amount of solving. Unfortunately, because the cells are visited in a random order, many unnecessary digits will be added before the one important digit that "locks down" a unique solution. For instance, if the first 9 cells added are all in the same column, there's a great deal of redundant information there.
  • If no other digit can replace the current one, keep it in and do not solve the puzzle. Because checking if a placement is legal is thousands of times faster than solving the puzzle twice, this could be a huge time-saver. However, just because there's no other legal digit now doesn't mean there won't be later, once we remove other digits.
  • Since we generated the original solution, solve only once for each cell and see if it matches the original. This doesn't work because the original solution could be anywhere within the search tree of possible solutions. For example, if the original solution is near the "left" side of the tree, and we start searching from the left, we will miss solutions on the right side of the tree.

I would also like to optimize the solving algorithm itself. The hard part is determining if a solution is unique. I can make micro-optimizations like creating a bitmask of legal placements for each cell, as described in this wonderful post. However, more advanced algorithms like Dancing Links or simulated annealing are not designed to determine uniqueness, but just to find any solution.

How can I optimize my minimal Sudoku generator?

回答1:

I have an idea on the 2nd option your had suggested will be better for that provided you add 3 extra checks for the 1st 17 numbers

  • find a list of 17 random numbers between 1-9
  • add each item at random location provided

    1. these new number added dont fail the 3 basic criteria of sudoku

      • there is no same number in same row
      • there is no same number in same column
      • there is no same number in same 3x3 matrix
    2. if condition 1 fails move to the next column or row and check for the 3 basic criteria again.

    3. if there is no next row (ie at 9th column or 9th row) add to the 1st column once the 17 characters are filled, run you solver logic on this and look for your unique solution.


回答2:

Here are the main optimizations I implemented with (highly approximate) percentage increases in speed:

  • Using bitmasks to keep track of which constraints (row, column, box) are satisfied in each cell. This makes it much faster to look up whether a placement is legal, but slower to make a placement. A complicating factor in generating puzzles with bitmasks, rather than just solving them, is that digits may have to be removed, which means you need to keep track of the three types of constraints as distinct bits. A small further optimization is to save the masks for each digit and each constraint in arrays. 40%
  • Timing out the generation and restarting if it takes too long. See here. The optimal strategy is to increase the timeout period after each failed generation, to reduce the chance that it goes on indefinitely. 30%, mainly from reducing the worst-case runtimes.
  • mbeckish and user295691's suggestions (see the comments to the original post). 25%