I need to resolve captcha automatically to grab the public data from sites.
I use python and opencv. I'm newbee in solving the images processing. After search, as a method to resolve captcha I came up with next. As the text in Captha uses group of related colours I try to use the HSV format and mask, then convert image to Grayscale and use Threshold (Adaptive_THRESH_MEAN_C) to remove noise from the image.
But this is not enough to remove noise and provide automatic text recognition with OCR (Tesseract). See images below.
Is there something I can improve in my solution or there is a better way?
Original images:
Processed images:
Code:
import cv2
import numpy as np
img = cv2.imread("1.jpeg")
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, (36, 0, 0), (70, 255,255)) #green
# mask = cv2.inRange(hsv, (0, 0, 0), (10, 255, 255))
# mask = cv2.inRange(hsv, (125, 0, 0), (135, 255,255))
img = cv2.bitwise_and(img, img, mask=mask)
img[np.where((img == [0,0,0]).all(axis = 2))] = [255,255,255]
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, 2)
cv2.imwrite("out.png", img)