Can anyone suggest me how to convert a scanned image into a searchable image or a scanned pdf to a searchable pdf ?
I have been stuck in this situation since quite a while now.
i have tried pdfocr application in ubuntu but no success.
相关问题
- How to get the bounding box of text that are overl
- Digitally Sign and Verify Pdf Document using itext
- How to improve OCR accuracy?
- Read PDF file in a new tab of same browser
- Data.Frame to PDF/HTML table with Colored Text
相关文章
- Render embedded image in PDF using Flying-Saucer f
- Generating .afm from .ttf [closed]
- How to convert PDF version 1.5 to version 1.4 in P
- C# PDF Printing Library [closed]
- I want to sort the words extracted from image in o
- iTextSharp - how to open/read/extract a file attac
- Moroccan License Plate Recognition (LPR) using Ope
- Tesseract thinks my 1's are 7's
Tesseract version 3.03 supports creation of searchable PDF from image. For PDF, you can use GhostScript to convert it to image before sending it to Tesseract.
https://github.com/tesseract-ocr/tesseract
Currently, there is no right way of doing this on Ubuntu. All OCR engines output plain text and there is no way to add that text as a hidden layer on PDF over the image text.
Option 1: Use gscan2pdf which will make you a searchable PDF, but the OCRed text is placed in the top-left corner of the page, is invisible and much too small.
Option 2: Use PDF X-Change Viewer which has an option to OCR and works correctly by adding a text layer over the scanned image which is in concordance with it. You'll have to run it in wine, because it is a Windows application.