How can I detect boxes in an image and pull them o

2019-04-13 06:54发布

问题:

I need a programmatic way of taking a scanned image (let's assume PNG or any other convenient image format) and breaking it up into many smaller images. The scanned image is a grid, and the boxes of the grid will always be the same size and in the same relative location. Because the image is scanned, they are not necessarily in the same absolute location. In each box is a character, ideally I'd like to save the character as its own image file, without any of the box border.

I prefer PHP and ImageMagick, which I think will be the right combination of tools. However, I'm flexible if there's a much better way to do it.

回答1:

Here is the start of an algorithmic approach to the problem...

I'm using this image I created for test purposes, called box.jpg, with dimensions of 352x232 pixels:

The goal is to identify the red box and extract the 'Dave' picture.

My algorithmic approach would be as follows:

  1. Scale the picture to one that has the original width, but a height of only 1 pixel; at the same time convert to grayscale and increase the contrast; use the textual description of each pixel's properties that ImageMagick can emit. This way you should be able to find the two spots where the vertical red line pixels accumulated the extreme color value. (The vertical red line pixels together with the gray letter pixels will have a more common color value.)

  2. Do the same in the other direction: Scale the picture to one that has the original height, but a width of only 1 pixel (convert to grayscale, increase the contrast, use the textual description... yadda-yadda). You'll find the two spots where the horizontal red line pixels accumulated the extreme color value. (Vertical red line combined with gray letter pixels will have a more 'average' color value.)

  3. Identify the location of each of the color value peaks in each of the two results: this will give you the geometry of the sub-image to extract from the original.

  4. Extract the sub-image from the original. Crop each side as needed.

I can't elaborate the complete algorithm in detail, but here are the commands I'd use for steps 1 and 2.

Command for Step 1

convert                   \
    -type grayscale       \
    -depth 8              \
     box.jpg              \
    -scale x1\!           \
    -contrast-stretch 6x6 \
     columns.txt

Result for Step 1

This is the content of columns.txt:

 # ImageMagick pixel enumeration: 352,1,255,gray
 0,0: (  0,  0,  0)  \#000000  black                 <-- left outer image border 
 1,0: (253,253,253)  #FDFDFD  gray(253,253,253)
 2,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 3,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 [...]
 20,0: (255,255,255)  #FFFFFF  white
 21,0: (255,255,255)  #FFFFFF  white
 [...]
 46,0: (255,255,255)  #FFFFFF  white
 47,0: (255,255,255)  #FFFFFF  white
 48,0: (243,243,243)  #F3F3F3  gray(243,243,243)
 49,0: (  0,  0,  0)  #000000  black                 <-- left box border (ex-red)
 50,0: (  0,  0,  0)  #000000  gray(0,0,0)           <-- left box border (ex-red)
 51,0: (  0,  0,  0)  #000000  black                 <-- left box border (ex-red)
 52,0: (  0,  0,  0)  #000000  black                 <-- left box border (ex-red)
 53,0: (221,221,221)  #DDDDDD  gray(221,221,221)
 54,0: (231,231,231)  #E7E7E7  gray(231,231,231)
 55,0: (236,236,236)  #ECECEC  gray(236,236,236)
 [...]
 247,0: (236,236,236)  #ECECEC  gray(236,236,236)
 248,0: (216,216,216)  #D8D8D8  gray(216,216,216)
 249,0: (  0,  0,  0)  #000000  black                <-- right box border (ex-red)
 250,0: (  1,  1,  1)  #010101  gray(1,1,1)          <-- right box border (ex-red)
 251,0: (  0,  0,  0)  #000000  black                <-- right box border (ex-red)
 252,0: (  1,  1,  1)  #010101  gray(1,1,1)          <-- right box border (ex-red)
 253,0: (226,226,226)  #E2E2E2  gray(226,226,226)
 254,0: (244,244,244)  #F4F4F4  gray(244,244,244)
 255,0: (244,244,244)  #F4F4F4  gray(244,244,244)
 [...]
 303,0: (255,255,255)  #FFFFFF  white
 304,0: (255,255,255)  #FFFFFF  white
 305,0: (255,255,255)  #FFFFFF  white
 [...]
 342,0: (255,255,255)  #FFFFFF  white
 343,0: (255,255,255)  #FFFFFF  white
 344,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 345,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 346,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 347,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 348,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 349,0: (255,255,255)  #FFFFFF  gray(255,255,255)
 350,0: (253,253,253)  #FDFDFD  gray(253,253,253)
 351,0: (  0,  0,  0)  #000000  black                <-- right outer image border

(Note: It appears to be a bit confusing that ImageMagick calls color values of #FFFFFF sometimes white, sometimes gray(255,255,255) -- as well as calling color values of #000000 somtimes black, somtimes gray(0,0,0)... Maybe a bug? Anyway, doesn't block us here...)

Command for Step 2

convert                   \
    -type grayscale       \
    -depth 8              \
     box.jpg              \
    -scale 1x\!           \
    -contrast-stretch 6x6 \
     rows.txt

Result for Step 2

This is the content of rows.txt (this time I dropped the confusing color names):

 # ImageMagick pixel enumeration: 1,232,255,gray 
 0,0: (  0,  0,  0)  #000000                 <-- top outer image border
 0,1: (255,255,255)  #FFFFFF  
 0,2: (255,255,255)  #FFFFFF  
 0,3: (255,255,255)  #FFFFFF       
 0,4: (255,255,255)  #FFFFFF       
 0,5: (255,255,255)  #FFFFFF       
 0,6: (255,255,255)  #FFFFFF       
 0,7: (255,255,255)  #FFFFFF       
 0,8: (255,255,255)  #FFFFFF       
 0,9: (255,255,255)  #FFFFFF       
 0,10: (255,255,255)  #FFFFFF       
 [...]
 0,46: (255,255,255)  #FFFFFF       
 0,47: (255,255,255)  #FFFFFF       
 0,48: (240,240,240)  #F0F0F0       
 0,49: (  0,  0,  0)  #000000                <-- top box border (ex-red)
 0,50: (  0,  0,  0)  #000000                <-- top box border (ex-red)
 0,51: (  0,  0,  0)  #000000                <-- top box border (ex-red)
 0,52: (  0,  0,  0)  #000000                <-- top box border (ex-red)
 0,53: (225,225,225)  #E1E1E1       
 0,54: (234,234,234)  #EAEAEA       
 [...]
 0,207: (244,244,244)  #F4F4F4       
 0,208: (230,230,230)  #E6E6E6       
 0,209: (  0,  0,  0)  #000000               <-- bottom box border (ex-red) 
 0,210: (  0,  0,  0)  #000000               <-- bottom box border (ex-red)
 0,211: (  0,  0,  0)  #000000               <-- bottom box border (ex-red)
 0,212: (  0,  0,  0)  #000000               <-- bottom box border (ex-red)
 0,213: (234,234,234)  #EAEAEA       
 0,214: (245,245,245)  #F5F5F5       
 [...]
 0,229: (255,255,255)  #FFFFFF       
 0,230: (255,255,255)  #FFFFFF       
 0,231: (  0,  0,  0)  #000000               <-- bottom outer image border

From these two results we can reliably conclude:

  1. the left vertical red box border line is at pixel columns 49-52.
  2. the right vertical red box border line is at pixel columns 249-252.
  3. the top horizontal red box border line is at pixel rows 49-52.
  4. the bottom horizontal red box border line is at pixel rows 209-222.
  5. From 1. and 2. you can compute an "inner width" of the red box of 197 (249 minus 52). Let's use 196 for the width of the extracted sub-image then.
  6. From 3. and 4. you can compute an "inner height" of the red box of 157 (209 minus 52). Let's use 156 for the height of the extracted sub-image then.
  7. The horizontal offset of the crop needs to be 52 pixels. We pick 53.
  8. The vertical offset of the crop needs to be 52 pixels. We pick 53.

Hence, our command to cut the sub-image from the original one could be:

convert  -crop 196x156+53+53  box3.jpg  sub-box.jpg

or, to make the image dimensions better distinguishable from the white background of this web page:

convert  -crop 196x156+53+53  box3.jpg  -colorize 20,0,20  sub-box.jpg

Resulting image:

You can now apply OCR on the image:

tesseract sub-box.jpg OCR-subbox 1>/dev/null && cat OCR-subbox.txt

  Dave