Recognizing visio shapes in an image

2019-04-02 07:40发布

问题:

Delivering SCADA solutions, we often get the our end user specifications specified in Structured Control Diagram (visio like flow diagrams seen below) that are often submitted in PDF format or as images.

In order to access these in C#, I was hoping to use one of the OpenCV libraries.

I was looking at template recognition, but it seems a wrong fit to start feeding into a machine learning algorithm to teach it to recognize the preknown specific shape of boxes and arrows.

The libraries I've looked at have some polyedge functions. However, as can be seen from the example below there is the danger that the system will treat the whole thing as one large polygon when there is no spacing between elements..

The annotations may be any 90 degree rotation and I would like to identify them as well as the contents of the rectangles using OCR.

I do not have any experience in this, which should be apparent by now, so I hope somebody can point me out in the direction of the appropriate rabbit hole. If there are multiple approaches, then choose the least math heavy.

Update: This is an example of the type of image I'm talking about.

The problem to adress is:

  • Identification of the red rectangles with texts in cells (OCR).
  • The identification of arrow, including direction and end point annotations. Line type, if possible.
  • Template matching of the components.
  • Fallback to some polyline entity or something if template matching fails.

回答1:

I'm sure you do realize this is an active field of research, the algorithms and methods described in this post are fundamental, maybe there are better/more specific solutions either completely heuristic or based on these fundamental methods.

I'll try to describe some methods which I used before and got good results from in similar situation (we worked on simple CAD drawings to find logical graph of a electrical grid) and I hope it would be useful.

Identification of the red rectangles with texts in cells (OCR).

this one is trivial for your solution as your documents are high quality, and you can easily adapt any current free OCR engines (e.g. Tesseract) for your purpose,there would be no problem for 90,180,... degrees, engines like Tesseract would detect them (you should config the engine, and in some cases you should extract detected boundries and pass them individually to OCR engine), you may just need some training and fine tuning to achieve maximum accuracy.

Template matching of the components.

Most template-matching algorithms are sensitive to scales and scale invariant ones are very complex, so I don't think you get very accurate results by using simple template matching algorithms if your documents vary in scale and size.

and your shapes features are very similar and sparse to get good results and unique features from algorithms such as SIFT and SURF.

I suggest you to use contours, your shapes are simple and your components are made from combining these simple shapes, by using contours you can find these simple shapes (e.g rectangles and triangles) and then check the contours against previously gathered ones based on component shapes, for example one of your components are created by combining four rectangles, so you can hold relative contours together for it and check it later against your documents in detection phase

there are lots of articles about contour analysis on the net, I suggest you to have a look at these, they will give you a clue on how you can use contours to detect simple and complex shapes:

http://www.emgu.com/wiki/index.php/Shape_%28Triangle,_Rectangle,_Circle,_Line%29_Detection_in_CSharp

http://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C

http://opencv-code.com/tutorials/detecting-simple-shapes-in-an-image/

http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html

by the way porting code to c# using EmguCV is trivial, so don't worry about it

The identification of arrow, including direction and endpoint annotations. Line type, if possible.

There are several methods for finding line segments (e.g. Hough Transform), the main problem in this part is other components as they are normally detected as lines too, so if we find components first and remove them from document, detecting lines would be a lot easier and with far less false detections.

Approach

1- Layer documents based on different Colors, and execute following phases on every desired layer.

2- Detect and extract text using OCR, then remove text regions and recreate the document without texts.

3-Detect Components, based on contour analysis and gathered component database, then remove detected components (both known and unknown types, as unknown shapes would increase your false detection in next phases) and recreate document without components,at this moment in case of good detection we should only have lines

4-Detect lines

5-At this point you can create a logical graph from extracted components,lines and tags based on detected position

Hope this Helps



回答2:

I cannot give you solutions to all your four questions, but the first question Identification of the red rectangles with texts in cells (OCR) does not sound very difficult. Here is my solution to this question:

Step 1: separate the color image into 3 layers: Red, Blue, and Green, and only use the red layer for the following operations.

Step 2: binarization of the red layer.

Step 3: connected component analysis of the binarization result, and keep the statics of each connected component (width of the blob, height of the blob for example)

Step 4: discard large blobs, and only keep blobs that are corresponding to texts. Also use the layout information to discard false text blobs (for example, texts are always in the large blob, and texts blobs have horizontal writing style and so on).

Step 5: perform OCR on textural components. When performing OCR, each blob will give you a confidence level, and this can be used for validation whether it is a textual component or not.