I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. I start off with a Sudoku like this:
I pre-process the Sudoku using OpenCV to extract black-and-white images of the individual digits and then put them through Tesseract. There are a couple of limitations to Tesseract, though:
- Tesseract is large, contains lots of functionality I don't need (I.e. Full text recognition), and requires English-language training data in order to function, which I think has to go onto the device's SD card. At least I can tell it to only look for digits using
tesseract.setVariable("tessedit_char_whitelist", "123456789");
- Tesseract often misinterprets a single digits as a string of digits, often containing newlines. It also sometimes just plain gets it wrong. Here are a few examples from the above Sudoku:
I have three questions:
- Is there any way I can overcome the limitations of Tesseract?
- If not, what is a useful, accurate method to detect individual digits (not k-nearest neighbours) that would be feasible to implement on Android - this could be a free library or a DIY solution.
- How can I improve the pre-processing to target that method? One possibility I've considered is using a thinning algorithm, as suggested by this post, but I'm not going to bother implementing it unless it will make a difference.
I took a class with one of the computer vision superstars who was/is at the top of the digit recognition algorithm rankings. He was really adamant that the best way to do digit recognition is...
OpenCV has a
HOGDescriptor
object, which computes HOG features. Look at this paper for advice on how to tune your HOG feature parameters. Any SVM library should do the job...theCvSVM
stuff that comes with OpenCV should be fine.For training data, I recommend using the MNIST handwritten digit database, which has thousands of pictures of digits with ground-truth data.
A slightly harder problem is to draw a bounding box around digits that appear in nature. Fortunately, it looks like you've already found a strategy for doing bounding boxes. :)
Easiest thing is to use Normalized Central Moments for digit recognition. If you have one font (or very similar fonts it works good).
See this solution: https://github.com/grzesiu/Sudoku-GUI
In core there are things responsible for digit recognition, extraction, moments training. First time application is run operator must provide information what number is seen. Then moments of image (extracted square roi) are assigned to number (operator input). Application base on comparing moments.
Here first youtube movie shows how application works: http://synergia.pwr.wroc.pl/2012/06/22/irb-komunikacja-pc/