Vertical projection and horizontal projection

2019-07-25 02:53发布

问题:

I'm trying to implement the following algorithm for ocr in that paper.

https://arxiv.org/ftp/arxiv/papers/1707/1707.00800.pdf

I'm confused about that part:

I constructed the vertical profile of an image:

env = np.sum(img, axis=1)

and that's What I get

I'm looking for a clear explanation of the algorithm, maybe with a pseudo code

回答1:

From my understanding, this algorithm is designed to separate individual Arab letters, which when written are connected via a horizontal line (I have exactly zero knowledge in Arab letters).

So the algorithm assumes that the given image is horizontally aligned (otherwise it won't work), and it is looking for areas with similar upper bonds of the black pixels.

After you have constructed the vertical profile of an image, you just need to find the most common height within the word (second highest in the image). Than you just separate the image between areas of that specific height and the rest.

Using your image:

The red line is the second most common height that you need to find (can be done with a histogram).

The green lines represent the separations between individual characters (so here you will get 4 characters).

By the way, your image is much more noisier and distorted than the one used in the paper, so you should probably find some range of values to discretize your height values to (for example with an histogram).

Pseudo-code (or unconfirmed untested code):

# Discretize the y values to n_bins (noisier image will mean you can use less bins):
height_hist = np.histogram(y, bins=n_bins)

# Find bin with the second largest number of values:
bin = np.argsort(height_hist[0])[-2]

# Get the limit values of the bin:
y_low, y_high = height_hist[1][bin], height_hist[1][bin+1]

# Go over the vertical projection values and separate to characters:

zero = y[0] # Assuming the first projected value is outside of the word
char_list = []
i = 0
inside_char = False
while i < len(y):
    if y[i] != zero:
        start = i # start of char

        # Find end of current char:
        for j in range(i, len(y)):
            if y_low<=y[i] and  y[i]<=y_high:
                end = j # end of char
                char_list.append([start, end]) # add to char list
                i = end

        # Find the start of the next char:
        for j in range(i, len(y)):
            if y_low>y[i] or  y[i]>y_high:
                i = j
    else:
        i += 1