Connect close-by dots for OCR (some hints asked, e

2020-03-08 06:36发布

问题:

The goal: to have it possible for a software library, such as Tesseract, be able to read the work TMP HW from the picture below.

I'm trying to find ways to "connect the dots", so to speak, using OpenCV, but I'm not sure it's possible. I have pictures with dotted text in different colors like below, which I then transform into a Grey scale picture and then apply canny to find edges. I've tried something with blurring, canny, erosion and dilation, but alas, being a newbie with this stuff, it looks like I don't seem to find a way to make these letters "whole" with edges.

Though it seems to be using OpenCV it's possible to create quite recognizable letters and not that much "noise" (and I think, if it matters, can find a way to correct the orientation too), somehow creating edges between the dots to make OCR libraries work better seem to elude. Any tips?

For the reference, I found How to connect broken lines in a binary image using Python/Opencv and Canny Edge Image - Noise removal for instance.

<Edit: language chosen, though examples etc. in any language probably go. I'm likely to work on .NET if it matters.