I have these images
For which I want to remove the text in the background. Only the captcha characters
should remain(i.e K6PwKA, YabVzu). The task is to identify these characters later using tesseract.
This is what I have tried, but it isn't giving much good accuracy.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"
img = cv2.imread("untitled.png")
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_filtered = cv2.inRange(gray_image, 0, 75)
cv2.imwrite("cleaned.png", gray_filtered)
How can I improve the same?
Note : I tried all the suggestion that I was getting for this question and none of them worked for me.
EDIT : According to Elias, I tried finding the color of the captcha text using photoshop by converting it to grayscale which came out to be somewhere in between [100, 105]. I then threshold the image based on this range. But the result which I got did not give satisfactory result from tesseract.
gray_filtered = cv2.inRange(gray_image, 100, 105)
cv2.imwrite("cleaned.png", gray_filtered)
gray_inv = ~gray_filtered
cv2.imwrite("cleaned.png", gray_inv)
data = pytesseract.image_to_string(gray_inv, lang='eng')
Output :
'KEP wKA'
Result :
EDIT 2 :
def get_text(img_name):
lower = (100, 100, 100)
upper = (104, 104, 104)
img = cv2.imread(img_name)
img_rgb_inrange = cv2.inRange(img, lower, upper)
neg_rgb_image = ~img_rgb_inrange
cv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image)
data = pytesseract.image_to_string(neg_rgb_image, lang='eng')
return data
gives :
and the text as
GXuMuUZ
Is there any way to soften it a little
Didn't try , but this might work. step 1: use ps to find out what color the captcha characters are. For excample, "YabVzu" is (128,128,128),
Step 2: Use pillow's method getdata()/getcolor(), it will return a sequence which contain the colour of every pixel.
then ,we project every item in the sequence to the original captcha image.
hence we know the positon of every pixel in the image.
Step 3: find all pixels whose colour with the most approximate values to (128,128,128). You may set a threshold to control the accuracy. this step return another sequence. Lets annotate it as Seq a
Step 4: generate a picture with the very same height and width as the original one. plot every pixel in [Seq a] in the very excat position in the picture. Here,we will get a cleaned training items
Step 5: Use a Keras project to break the code. And the precission should be over 72%.
Here are two potential approaches and a method to correct distorted text:
Method #1: Morphological operations + contour filtering
Obtain binary image. Load image, grayscale, then Otsu's threshold.
Remove text contours. Create a rectangular kernel with
cv2.getStructuringElement
and then perform morphological operations to remove noise.Filter and remove small noise. Find contours and filter using contour area to remove small particles. We effectively remove the noise by filling in the contour with
cv2.drawContours
Perform OCR. We invert the image then apply a slight Gaussian blur. We then OCR using Pytesseract with the
--psm 6
configuration option to treat the image as a single block of text. Look at Tesseract improve quality for other methods to improve detection and Pytesseract configuration options for additional settings.Input image
->
Binary->
Morph openingContour area filtering
->
Invert->
Apply blur to get resultResult from OCR
Code
Method #2: Color segmentation
With the observation that the desired text to extract has a distinguishable contrast from the noise in the image, we can use color thresholding to isolate the text. The idea is to convert to HSV format then color threshold to obtain a mask using a lower/upper color range. From were we use the same process to OCR with Pytesseract.
Input image
->
Mask->
ResultCode
Correcting distorted text
OCR works best when the image is horizontal. To ensure that the text is in an ideal format for OCR, we can perform a perspective transform. After removing all the noise to isolate the text, we can perform a morph close to combine individual text contours into a single contour. From here we can find the rotated bounding box using
cv2.minAreaRect
and then perform a four point perspective transform usingimutils.perspective.four_point_transform
. Continuing from the cleaned mask, here's the results:Mask
->
Morph close->
Detected rotated bounding box->
ResultOutput with the other image
Updated code to include perspective transform
Note: The color threshold range was determined using this HSV threshold script
Your code produces better results than this. Here, I set a threshold for
upperb
andlowerb
values based on histogramCDF
values and a threshold. PressESC
button to get next image.This code is unnecessarily complex and needs to be optimized in various ways. Code can be reordered to skip some steps. I kept it as some parts may help others. Some existing noise can be removed by keeping contour with area above certain threshold. Any suggestions on other noise reduction method is welcome.
Similar easier code for getting 4 corner points for perspective transform can be found here,
Accurate corners detection?
Code Description:
Mark the ROI by drawing rectangle and corner points in original image
Straighten the ROI and extract it
Code:
1. Median Filter:
2. OTSU Threshold:
3. Invert:
4. Inverted Image Dilation:
5. Extract by Masking:
6. ROI points for transform:
7. Perspective Corrected Image:
8. Median Blur:
9. OTSU Threshold:
10. Inverted Image:
11. ROI Extraction:
12. Clamping:
13. Dilation:
14. Final ROI:
15. Histogram plot of step 11 image: