Text Extraction from Notebook

2019-06-09 02:12发布

问题:

I am trying to extract handwritten text from images. I use python with opencv functions such us find_contours. It was all going pretty well when I used images like this one:

It works fine because I have a plain background. But then I tested it with this image:

Because of the notebook's lines in the background, I am not able to extract the text only. Although the text is red, I turn all images to grayscale or sometimes threshold so it all turns black just like the notebook lines. That way the colour of the text does not matter. So my question here is: could anyone please give me advice or a possible solution on how to deal with this kind of background in order to extract the text. I really don't want to use the sliding window method. Thank you in advance

回答1:

I decided to try again with the HoughLinesP functionality in OpenCV which this time gave me a much more promising and satisfying result. Here's a snippet for the code I used to remove most of the lines:

import cv2
import numpy

img = cv2.imread('thresh.png')
edges = cv2.Canny(img, 50, 150, apertureSize=3)
minLineLength = 0
maxLineGap = 5
lines = cv2.HoughLinesP(edges, 1, numpy.pi / 180, 100, minLineLength, maxLineGap)

for x in range(len(lines)):
    for x1, y1, x2, y2 in lines[x]:
        cv2.line(img, (x1, y1), (x2, y2), (0, 0, 0), 2)

cv2.imwrite('houghlines3.jpg', img)

Additional Info: thresh.png is the image in which I store the threshold version of the initial pic. The way this whole thing works is that it finds the lines in the image and paints them black(because in my threshold what is close to white becomes black and vice-versa). That's how it clears the lines.

PS: Hope I helped somebody! Cheers!