I wrote a basic python script to call and use the GCP Vision API. My aim is to send an image of a product to it and to retrieve (with OCR) the words written on this box. I have a predefined list of brands so I can search within the returned text from the API the brand and detect what it is.
My python script is the following:
import io
from google.cloud import vision
from google.cloud.vision import types
import os
import cv2
import numpy as np
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "**************************"
def detect_text(file):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(file, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
file_name = "Image.jpg"
img = cv2.imread(file_name)
detect_text(file_name)
For now, I am experimenting with the following product image: (951 × 335 resolution)
Its brand is Acuvue
.
The problem is the following. When I am testing the online demo of GCP Cloud Vision API then I am getting the following text result for this image:
FOR ASTIGMATISM 1-DAY ACUVUE MOIST WITH LACREON™ 30 Lenses BRAND CONTACT LENSES UV BLOCKING
(The json result for this returns all the above words including the word Acuvue
which matters for me but the json is too long to post it here)
Therefore, the online demo detects pretty well the text on the product and at least it detects accurately the word Acuvue
(which is the brand). However, when I am calling the same API in my python script with the same image I am getting the following result:
Texts:
"1.DAY
FOR ASTIGMATISM
WITH
LACREONTM
MOIS
30 Lenses
BRAND CONTACT LENSES
UV BLOCKING
"
bounds: (221,101),(887,101),(887,284),(221,284)
"1.DAY"
bounds: (221,101),(312,101),(312,125),(221,125)
"FOR"
bounds: (622,107),(657,107),(657,119),(622,119)
"ASTIGMATISM"
bounds: (664,107),(788,107),(788,119),(664,119)
"WITH"
bounds: (614,136),(647,136),(647,145),(614,145)
"LACREONTM"
bounds: (600,151),(711,146),(712,161),(601,166)
"MOIS"
bounds: (378,162),(525,153),(528,200),(381,209)
"30"
bounds: (614,177),(629,178),(629,188),(614,187)
"Lenses"
bounds: (634,178),(677,180),(677,189),(634,187)
"BRAND"
bounds: (361,210),(418,210),(418,218),(361,218)
"CONTACT"
bounds: (427,209),(505,209),(505,218),(427,218)
"LENSES"
bounds: (514,209),(576,209),(576,218),(514,218)
"UV"
bounds: (805,274),(823,274),(823,284),(805,284)
"BLOCKING"
bounds: (827,276),(887,276),(887,284),(827,284)
But this does not detect at all the word "Acuvue" as the demo does!!
Why is this happening?
Can I fix something in my python script to make it work properly?
From the docs:
My hope was that the web API was actually using the latter, and then filtering the results based on the confidence.
At any rate, I was hoping (and my experience has been) that the latter method would "try harder" to find all the strings.
I don't think you were doing anything "wrong". There are just two parallel detection methods. One (DOCUMENT_TEXT_DETECTION) is more intense, optimized for documents (likely for straightened, aligned and evenly spaced lines), and gives more information that might be unnecessary for some applications.
So I suggested you modify your code following the Python example here.
Lastly, my guess is that the
\342\204\242
you ask about are escaped octal values corresponding to utf-8 characters it thinks it found when trying to identify the ™ symbol.If you use the following snippet:
You'll be happy to see that it prints ™.