Converting a Vision VNTextObservation to a String

2019-01-29 18:43发布

I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages:

1) class VNDetectTextRectanglesRequest

2) class VNTextObservation

It looks like they can detect characters, but I don't see a means to do anything with the characters. Once you've got characters detected, how would you go about turning them into something that can be interpreted by NSLinguisticTagger?

Here's a post that is a brief overview of Vision.

Thank you for reading.

7条回答
啃猪蹄的小仙女
2楼-- · 2019-01-29 19:03

For those still looking for a solution I wrote a quick library to do this. It uses both the Vision API and Tesseract and can be used to achieve the task the question describes with one single method:

func sliceaAndOCR(image: UIImage, charWhitelist: String, charBlackList: String = "", completion: @escaping ((_: String, _: UIImage) -> Void))

This method will look for text in your image, return the string found and a slice of the original image showing where the text was found

查看更多
三岁会撩人
3楼-- · 2019-01-29 19:04

Adding my own progress on this, if anyone have a better solution:

I've successfully drawn the region box and character boxes on screen. The vision API of Apple is actually very performant. You have to transform each frame of your video to an image and feed it to the recogniser. It's much more accurate than feeding directly the pixel buffer from the camera.

 if #available(iOS 11.0, *) {
            guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return}

            var requestOptions:[VNImageOption : Any] = [:]

            if let camData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
                requestOptions = [.cameraIntrinsics:camData]
            }

            let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
                                                            orientation: 6,
                                                            options: requestOptions)

            let request = VNDetectTextRectanglesRequest(completionHandler: { (request, _) in
                guard let observations = request.results else {print("no result"); return}
                let result = observations.map({$0 as? VNTextObservation})
                DispatchQueue.main.async {
                    self.previewLayer.sublayers?.removeSubrange(1...)
                    for region in result {
                        guard let rg = region else {continue}
                        self.drawRegionBox(box: rg)
                        if let boxes = region?.characterBoxes {
                            for characterBox in boxes {
                                self.drawTextBox(box: characterBox)
                            }
                        }
                    }
                }
            })
            request.reportCharacterBoxes = true
            try? imageRequestHandler.perform([request])
        }
    }

Now I'm trying to actually reconize the text. Apple doesn't provide any built in OCR model. And I want to use CoreML to do that, so I'm trying to convert a Tesseract trained data model to CoreML.

You can find Tesseract models here: https://github.com/tesseract-ocr/tessdata and I think the next step is to write a coremltools converter that support those type of input and output a .coreML file.

Or, you can link to TesseractiOS directly and try to feed it with your region boxes and character boxes you get from the Vision API.

查看更多
Luminary・发光体
4楼-- · 2019-01-29 19:08

SwiftOCR

I just got SwiftOCR to work with small sets of text.

https://github.com/garnele007/SwiftOCR

uses

https://github.com/Swift-AI/Swift-AI

which uses NeuralNet-MNIST model for text recognition.

TODO : VNTextObservation > SwiftOCR

Will post example of it using VNTextObservation once I have it one connected to the other.

OpenCV + Tesseract OCR

I tried to use OpenCV + Tesseract but got compile errors then found SwiftOCR.

SEE ALSO : Google Vision iOS

Note Google Vision Text Recognition - Android sdk has text detection but also has iOS cocoapod. So keep an eye on it as should add text recognition to the iOS eventually.

https://developers.google.com/vision/text-overview

//Correction: just tried it but only Android version of the sdk supports text detection.

https://developers.google.com/vision/text-overview

If you subscribe to releases: https://libraries.io/cocoapods/GoogleMobileVision

Click SUBSCRIBE TO RELEASES you can see when TextDetection is added to the iOS part of the Cocoapod

查看更多
迷人小祖宗
5楼-- · 2019-01-29 19:09

Firebase ML Kit does it for iOS (and Android) with their on-device Vision API and it outperforms Tesseract and SwiftOCR.

查看更多
啃猪蹄的小仙女
6楼-- · 2019-01-29 19:14

Thanks to a GitHub user, you can test an example: https://gist.github.com/Koze/e59fa3098388265e578dee6b3ce89dd8

- (void)detectWithImageURL:(NSURL *)URL
{
    VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithURL:URL options:@{}];
    VNDetectTextRectanglesRequest *request = [[VNDetectTextRectanglesRequest alloc] initWithCompletionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {
        if (error) {
            NSLog(@"%@", error);
        }
        else {
            for (VNTextObservation *textObservation in request.results) {
//                NSLog(@"%@", textObservation);
//                NSLog(@"%@", textObservation.characterBoxes);
                NSLog(@"%@", NSStringFromCGRect(textObservation.boundingBox));
                for (VNRectangleObservation *rectangleObservation in textObservation.characterBoxes) {
                    NSLog(@" |-%@", NSStringFromCGRect(rectangleObservation.boundingBox));
                }
            }
        }
    }];
    request.reportCharacterBoxes = YES;
    NSError *error;
    [handler performRequests:@[request] error:&error];
    if (error) {
        NSLog(@"%@", error);
    }
}

The thing is, the result is an array of bounding boxes for each detected character. From what I gathered from Vision's session, I think you are supposed to use CoreML to detect the actual chars.

Recommended WWDC 2017 talk: Vision Framework: Building on Core ML (haven't finished watching it either), have a look at 25:50 for a similar example called MNISTVision

Here's another nifty app demonstrating the use of Keras (Tensorflow) for the training of a MNIST model for handwriting recognition using CoreML: Github

查看更多
forever°为你锁心
7楼-- · 2019-01-29 19:17

I'm using Google's Tesseract OCR engine to convert the images into actual strings. You'll have to add it to your Xcode project using cocoapods. Although Tesseract will perform OCR even if you simply feed the image containing texts to it, the way to make it perform better/faster is to use the detected text rectangles to feed pieces of the image that actually contain text, which is where Apple's Vision Framework comes in handy. Here's a link to the engine: Tesseract OCR And here's a link to the current stage of my project that has text detection + OCR already implemented: Out Loud - Camera to Speech Hope these can be of some use. Good luck!

查看更多
登录 后发表回答