What is a good method to segment characters that are united as in the following figure, knowing that:
- characters have this font, but the font size varies based on the image size
- only isolated groups of characters from the image are connected
Also, how can i detect if in a given bounding box, there are 2 or more letters which are connected?
I tried with checking for width > height for detecting connected characters but it doesn't work for the blue groups in the image.
I also tried a segmentation method based on:
Article section 3.4
for separating characters but got poor results.
IDEA: if you have a good ocr already, you can try to apply ocr all these connected components (or contours). If ocr cant detect a letter; than there is not 1 letter, there are 2 or more.
IDEA: check convexity defects of these connected components, the closest defect points are where the bridges are.
IDEA: use a kernel having small width & big height for erosion+dilation (morphological opening)
IDEA: take y-derivative of the image. The smallest contours (or lines) left will be your bridges. Mark them and erase those pixels from the original image.
IDEA: search problem approach: Take 2 letters from alphabet (and this font), connect them horizontally with some tool and use matchShapes method (moment match) of opencv to find if that shape matches with your connected component. Or try to implement autocorrelation.
good luck.