The objective is to create bounding boxes using text recognition methods (eg: OpenCV) for US floor plan images, which can then be fed into a text reader (eg: LSTM or tesseract).
Several methods which have been tried cv2.findContours and cv2.boundingRect methods have been attempted but have largely failed to generalise to different types of floor plans (there is a wide deviation in how the floor plans look).
For example, cv2.findContours using grayscale, adaptive thresholds, erosion and dilation (with various iterations) before applying the cv2.findContours function results in the bellow. Note that Bedroom 2 and Kitchen are not being picked up correctly.
Additional example which fails to find any regions:
Any thoughts on text recognition models or cleaning procedures that will improve the accuracy of the text recognition model, preferably with code examples?
This answer is based on the assumption that images are similar one to another (like their size, thickness of walls, letters...). If they are not this wouldn't be a good approach because you would have to change the thresholders for every image. That being said, I would try to transform the image to binary and search for contours. After that you can add criterion like height, weight etc. to filter out the walls. After that You can draw contours on a mask and then dilate the image. That will combine letters close to each other into one contour. Then you can create bounding box for all the contours which is your ROI. Then you can use any OCR on that region. Hope it helps a bit. Cheers!
Example:
Result: