Separation of background/foreground layers in a sc

2019-08-15 20:05发布

I need to automatically remove the mildly colored background of a scanned document image for OCR.

ScanTailor is an open source C++ GUI-based app that does background separation among other things, but I cannot figure out how to run only the last step which actually removes the background.

Ideally, I could find the code that does this and either:

  1. Port that part to C#
  2. Modify the C++ to respond to command line execution, only performing that step on a given image

Can you help me understand how I can do either?
or do you know other libraries that can do this? (any language/platform acceptable)

2条回答
萌系小妹纸
2楼-- · 2019-08-15 20:10

You are referring to Thresholding, Despeckling and Noise Removal techniques which are necessary in OCR applications.

The quality of the results depends very much an many different factors -

Print quality of the original Scan quality Image resolution Background colours and patterns used. Noise and other marks.

You may find the IEvolution.NET library at http://www.hi-components.com/nievolution.asp useful. It has many image processing functions to play with.

There are many commercial engines available. There is no one perfect function to solve image processing problems. You must adapt the functions and parameter to match your images. http://www.recogniform.com/thresholding.htm

A Google search will show up lots of results.

查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-08-15 20:13

Maybe the algorithm is, approximately:

  • Decide what the background color is
  • Scan the bitmap for pixels whose color is (and/or is sufficiently similar to) the background color
  • Convert these pixels to white or transparent
  • Possibly (especially if the page contains images and not just text) ignore isolated pixels, which are the background color but are not next to other also-background pixels

If it's a high-resolution low-color-depth (e.g. black-and-white) image, then you need to apply this algorithm to groups of pixels.

查看更多
登录 后发表回答