Cloud Vision API - PDF OCR

2019-01-23 17:50发布

问题:

I just tested the Google Cloud Vision API to read the text, if exist, in a image.

Until now I installed the Maven Server and the Redis Server. I just follow the instructions in this page.

https://github.com/GoogleCloudPlatform/cloud-vision/tree/master/java/text

Until now I was able to tested with .jpg files, is it possible to do it with tiff files or pdf??

I am using the following command:

java -cp target/text-1.0-SNAPSHOT-jar-with-dependencies.jar     com.google.cloud.vision.samples.text.TextApp ../../data/text/

Inside the text directory, I have the files in jpg format.

Then to read the converted file, I don't know how to do that, just I run the following command

java -cp target/text-1.0-SNAPSHOT-jar-with-dependencies.jar com.google.cloud.vision.samples.text.TextApp

And I get the message to enter a word or phrase to search in the converted files. Is there a way to see the whole document transformed?

Thanks!

回答1:

Unfortunately PDF and TIFF formats are not currently supported for Cloud Vision.

The accepted formats are : (taken from the the doc)

  • JPEG
  • PNG8
  • PNG24
  • GIF
  • Animated GIF (first frame only)
  • BMP
  • WEBP
  • RAW
  • ICO


回答2:

On April 6, 2018, support for PDF and TIFF files in document text detection was added to Google Cloud Vision API (see Release Notes).

According to documentation:

  • The Vision API can detect and transcribe text from PDF and TIFF files stored in Google Cloud Storage.

  • Document text detection from PDF and TIFF must be requested using the asyncBatchAnnotate function, which performs an asynchronous request and provides its status using the operations resources.

  • Output from a PDF/TIFF request is written to a JSON file created in the specified Google Cloud Storage bucket.

Example:

1) Upload a file to your Google Cloud Storage

2) Make a POST request to perform PDF/TIFF document text detection

Request:

POST https://vision.googleapis.com/v1p2beta1/files:asyncBatchAnnotate
Authorization: Bearer <your access token>

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "gs://<your bucket name>/input.pdf"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "gs://<your bucket name>/output/"
        },
        "batchSize": 1
      }
    }
  ]
}

Response:

{
  "name": "operations/9b1f9d773d216406"
}

3) Make a GET request to check if document text detection is done

Request:

GET https://vision.googleapis.com/v1/operations/9b1f9d773d216406
Authorization: Bearer <your access token>

Response:

{
    "name": "operations/9b1f9d773d216406",
    "metadata": {
        "@type": "type.googleapis.com/google.cloud.vision.v1p2beta1.OperationMetadata",
        "state": "RUNNING",
        "updateTime": "2018-06-17T20:18:09.117787733Z"
    },
    "done": true,
    "response": {
        "@type": "type.googleapis.com/google.cloud.vision.v1p2beta1.AsyncBatchAnnotateFilesResponse",
        "responses": [
            {
                "outputConfig": {
                    "gcsDestination": {
                        "uri": "gs://<your bucket name>/output/"
                    },
                    "batchSize": 1
                }
            }
        ]
    }
}

4) Check the results in the specified Google Cloud Storage folder



回答3:

https://cloud.google.com/vision/docs/pdf

I know this question is old, but now Google Vision released support for PDF!