Remove receipt image border using ImageMagick

2019-05-03 06:55发布

I'm using ImageMagick service to pre-process the receipt image before using tesseract-OCR engine to extract texts. I need to remove the background of the receipts. I've gone through masking to remove the border here. But I'm unable to create the mask for the receipts.

However, I've tried to remove shadows from the receipt images.

Initial Image (Example receipt)

enter image description here

convert input.png -colorspace gray \
      \( +clone -blur 0x2 \) +swap -compose divide -composite \
      -linear-stretch 5%x0%   photocopy.png

After the code is applied:

enter image description here

I've tried the code below to make all colors except white to black but this does not seem to be totally blacking out the background of photocopy.png.

convert receipt.jpg -fill black -fuzz 20% +opaque "#ffffff" black_border.jpg

enter image description here

Is there any way to remove the border of the receipt image? Or create any kind of masks out of the image? Note: I need to remove noise and border for multiple images with different backgrounds.

2条回答
手持菜刀,她持情操
2楼-- · 2019-05-03 07:45

If using ImageMagick on a unix-like system, you could try my text cleaner script.

textcleaner -f 20 -o 10 -e normalize UhSV6.jpg result.jpg

enter image description here

查看更多
Anthone
3楼-- · 2019-05-03 08:01

To answer your question

"Is there any way to remove the border of the receipt image? Or create any kind of masks out of the image?"

The following command (based on your own code) will create an image which you can use to derive the dimensions of an applicable mask:

convert                     \
   origscan.jpg             \
  -colorspace gray          \
   \( +clone 0 -blur 0x2 \) \
  +swap                     \
  -compose divide           \
  -composite                \
  -linear-stretch 5%x0%     \
  -threshold 5%             \
  -trim                     \
   mask-image.png

You can use that mask-image to create a monochrome (black) mask -- in one command:

convert                     \
   origscan.jpg             \
  -colorspace gray          \
   \( +clone 0 -blur 0x2 \) \
  +swap                     \
  -compose divide           \
  -composite                \
  -linear-stretch 5%x0%     \
  -threshold 5%             \
   \(                       \
      -clone 0              \
      -fill '#000000'       \
      -colorize 100         \
   \)                       \
  -delete 0                 \
   black-mask.png

Here are the results of above two commands, side by side:

 

You can use identify to get the geometry of mask-image.png as well as black-mask.png:

identify -format "%g\n" *mask*.png
  2322x4128+366+144
  2322x4128+366+144

So the image canvases are 2322 pixels wide and 4128 pixels high. The visible parts both images are of course smaller, following our -trim operation. (The +366+144 part indicates a horizontal/vertical offset from the top left corner of the original image.)


Additional comment: Having said all this: you should really look into creating better photos from your receipts! (If you have a camera which can create images of 4128 pixels height this shouldn't be a problem. If you have so many receipts to process as you say, then it may be a good idea to acquire a small platten glass that you can place on top of the paper in order to have it straightened out while photographing...)

查看更多
登录 后发表回答