I am applying OCR against subtitle in TV footage. (I am using Tesseact 3.x w/ C++) I am trying to split text and background part as a preprocessing of OCR.
Here's the original image:
And, preprocessed image:
The OCR result is: Sicemn clone
As the above preprocessed image shown, there're some "fog" remained around the letter which prevents OCR module to do their job properly.
Is there any way to recognize those "fog" programatically to remove, or do some image processing to remove/reduce it from the preprocessed image?
Since preprocessed logic is heavily optimized to handle different images, I rather want to find a way to "clean" the preprocessed image, than modifying preprocessed logic (since optimizing to this pics can affecting to other pics)
Any suggestion is very welcome.
Update
Apparently, sixela's answer is great, and will work with most of the case.
The case it does not work is background also include similar color of text
Example of not working case:
Example of result:
Seemingly, Gaussian filter seems to cause a problem in this types of footage.
This implies, different footage may requires different approach.
I managed to have a clearer (not perfect) image by using morphological operations and thresholding.
Here is how:
- I started by converting the original image in greyscale
- Applied a gaussian Blur (9x9 kernel) to denoise the greyscale image
- Top Hat Morphological operation (3x3 kernel)to get the white text
- Otsu thresholding method
- dilation
- Inverted binary threshold to get the white text in black
I finally obtained the following image
Which gives, as OCR results, this text: "Since vou don'k"
PS: This result can of course be improved by tweaking the parameters (kernel size for example) but i hope it can guide you. I used OpenCv in Python to quickly try out those methods.
import cv2
image = cv2.imread('./inputImg.png', 0)
imgBlur = cv2.GaussianBlur(image, (9, 9), 0)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
imgTH = cv2.morphologyEx(imgBlur, cv2.MORPH_TOPHAT, kernel)
_, imgBin = cv2.threshold(imgTH, 0, 250, cv2.THRESH_OTSU)
imgdil = cv2.dilate(imgBin, kernel)
_, imgBin_Inv = cv2.threshold(imgdil, 0, 250, cv2.THRESH_BINARY_INV)
cv2.imshow('original', image)
cv2.imshow('bin', imgBin)
cv2.imshow('dil', imgdil)
cv2.imshow('inv', imgBin_Inv)
cv2.imwrite('./output.png', imgBin_Inv)
cv2.waitKey(0)
After this i tried the output image on Tesseract with this command:
tesseract output.png stdout