Here is the start of an algorithmic approach to the problem...
I'm using this image I created for test purposes, called box.jpg
, with dimensions of 352x232 pixels:
The goal is to identify the red box and extract the 'Dave' picture.
My algorithmic approach would be as follows:
Scale the picture to one that has the original width, but a height of only 1 pixel; at the same time convert to grayscale and increase the contrast; use the textual description of each pixel's properties that ImageMagick can emit. This way you should be able to find the two spots where the vertical red line pixels accumulated the extreme color value. (The vertical red line pixels together with the gray letter pixels will have a more common color value.)
Do the same in the other direction: Scale the picture to one that has the original height, but a width of only 1 pixel (convert to grayscale, increase the contrast, use the textual description... yadda-yadda). You'll find the two spots where the horizontal red line pixels accumulated the extreme color value. (Vertical red line combined with gray letter pixels will have a more 'average' color value.)
Identify the location of each of the color value peaks in each of the two results: this will give you the geometry of the sub-image to extract from the original.
Extract the sub-image from the original. Crop each side as needed.
I can't elaborate the complete algorithm in detail, but here are the commands I'd use for steps 1 and 2.
Command for Step 1
convert \
-type grayscale \
-depth 8 \
box.jpg \
-scale x1\! \
-contrast-stretch 6x6 \
columns.txt
Result for Step 1
This is the content of columns.txt
:
# ImageMagick pixel enumeration: 352,1,255,gray
0,0: ( 0, 0, 0) \#000000 black <-- left outer image border
1,0: (253,253,253) #FDFDFD gray(253,253,253)
2,0: (255,255,255) #FFFFFF gray(255,255,255)
3,0: (255,255,255) #FFFFFF gray(255,255,255)
[...]
20,0: (255,255,255) #FFFFFF white
21,0: (255,255,255) #FFFFFF white
[...]
46,0: (255,255,255) #FFFFFF white
47,0: (255,255,255) #FFFFFF white
48,0: (243,243,243) #F3F3F3 gray(243,243,243)
49,0: ( 0, 0, 0) #000000 black <-- left box border (ex-red)
50,0: ( 0, 0, 0) #000000 gray(0,0,0) <-- left box border (ex-red)
51,0: ( 0, 0, 0) #000000 black <-- left box border (ex-red)
52,0: ( 0, 0, 0) #000000 black <-- left box border (ex-red)
53,0: (221,221,221) #DDDDDD gray(221,221,221)
54,0: (231,231,231) #E7E7E7 gray(231,231,231)
55,0: (236,236,236) #ECECEC gray(236,236,236)
[...]
247,0: (236,236,236) #ECECEC gray(236,236,236)
248,0: (216,216,216) #D8D8D8 gray(216,216,216)
249,0: ( 0, 0, 0) #000000 black <-- right box border (ex-red)
250,0: ( 1, 1, 1) #010101 gray(1,1,1) <-- right box border (ex-red)
251,0: ( 0, 0, 0) #000000 black <-- right box border (ex-red)
252,0: ( 1, 1, 1) #010101 gray(1,1,1) <-- right box border (ex-red)
253,0: (226,226,226) #E2E2E2 gray(226,226,226)
254,0: (244,244,244) #F4F4F4 gray(244,244,244)
255,0: (244,244,244) #F4F4F4 gray(244,244,244)
[...]
303,0: (255,255,255) #FFFFFF white
304,0: (255,255,255) #FFFFFF white
305,0: (255,255,255) #FFFFFF white
[...]
342,0: (255,255,255) #FFFFFF white
343,0: (255,255,255) #FFFFFF white
344,0: (255,255,255) #FFFFFF gray(255,255,255)
345,0: (255,255,255) #FFFFFF gray(255,255,255)
346,0: (255,255,255) #FFFFFF gray(255,255,255)
347,0: (255,255,255) #FFFFFF gray(255,255,255)
348,0: (255,255,255) #FFFFFF gray(255,255,255)
349,0: (255,255,255) #FFFFFF gray(255,255,255)
350,0: (253,253,253) #FDFDFD gray(253,253,253)
351,0: ( 0, 0, 0) #000000 black <-- right outer image border
(Note: It appears to be a bit confusing that ImageMagick calls color values of #FFFFFF
sometimes white
, sometimes gray(255,255,255)
-- as well as calling color values of #000000
somtimes black
, somtimes gray(0,0,0)
... Maybe a bug? Anyway, doesn't block us here...)
Command for Step 2
convert \
-type grayscale \
-depth 8 \
box.jpg \
-scale 1x\! \
-contrast-stretch 6x6 \
rows.txt
Result for Step 2
This is the content of rows.txt
(this time I dropped the confusing color names):
# ImageMagick pixel enumeration: 1,232,255,gray
0,0: ( 0, 0, 0) #000000 <-- top outer image border
0,1: (255,255,255) #FFFFFF
0,2: (255,255,255) #FFFFFF
0,3: (255,255,255) #FFFFFF
0,4: (255,255,255) #FFFFFF
0,5: (255,255,255) #FFFFFF
0,6: (255,255,255) #FFFFFF
0,7: (255,255,255) #FFFFFF
0,8: (255,255,255) #FFFFFF
0,9: (255,255,255) #FFFFFF
0,10: (255,255,255) #FFFFFF
[...]
0,46: (255,255,255) #FFFFFF
0,47: (255,255,255) #FFFFFF
0,48: (240,240,240) #F0F0F0
0,49: ( 0, 0, 0) #000000 <-- top box border (ex-red)
0,50: ( 0, 0, 0) #000000 <-- top box border (ex-red)
0,51: ( 0, 0, 0) #000000 <-- top box border (ex-red)
0,52: ( 0, 0, 0) #000000 <-- top box border (ex-red)
0,53: (225,225,225) #E1E1E1
0,54: (234,234,234) #EAEAEA
[...]
0,207: (244,244,244) #F4F4F4
0,208: (230,230,230) #E6E6E6
0,209: ( 0, 0, 0) #000000 <-- bottom box border (ex-red)
0,210: ( 0, 0, 0) #000000 <-- bottom box border (ex-red)
0,211: ( 0, 0, 0) #000000 <-- bottom box border (ex-red)
0,212: ( 0, 0, 0) #000000 <-- bottom box border (ex-red)
0,213: (234,234,234) #EAEAEA
0,214: (245,245,245) #F5F5F5
[...]
0,229: (255,255,255) #FFFFFF
0,230: (255,255,255) #FFFFFF
0,231: ( 0, 0, 0) #000000 <-- bottom outer image border
From these two results we can reliably conclude:
- the left vertical red box border line is at pixel columns 49-52.
- the right vertical red box border line is at pixel columns 249-252.
- the top horizontal red box border line is at pixel rows 49-52.
- the bottom horizontal red box border line is at pixel rows 209-222.
- From 1. and 2. you can compute an "inner width" of the red box of 197 (249 minus 52). Let's use 196 for the width of the extracted sub-image then.
- From 3. and 4. you can compute an "inner height" of the red box of 157 (209 minus 52). Let's use 156 for the height of the extracted sub-image then.
- The horizontal offset of the crop needs to be 52 pixels. We pick 53.
- The vertical offset of the crop needs to be 52 pixels. We pick 53.
Hence, our command to cut the sub-image from the original one could be:
convert -crop 196x156+53+53 box3.jpg sub-box.jpg
or, to make the image dimensions better distinguishable from the white background of this web page:
convert -crop 196x156+53+53 box3.jpg -colorize 20,0,20 sub-box.jpg
Resulting image:
You can now apply OCR on the image:
tesseract sub-box.jpg OCR-subbox 1>/dev/null && cat OCR-subbox.txt
Dave