I'm trying to convert an image file to text using tess4j maven dependency.
Dependency in pom.xml:-
<!-- OCR dependency -->
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.4.0</version>
<exclusions>
<exclusion>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
</exclusion>
<exclusion>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>4.4.0</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.5.0</version>
</dependency>
My code:-
public String convertImageToText(String imageFilePath) throws TesseractException {
File imageFile = new File("imageFilePath");
ITesseract iTesseract = new Tesseract();
ImageIO.scanForPlugins();
String result = iTesseract.doOCR(imageFile);
System.out.println("Converted text is: "+result);
return result;
}
However, when I try executing my program, I always encounter below exception:
Exception in thread "main" net.sourceforge.tess4j.TesseractException: java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215)
at utilities.HelperMethods.convertImageToText(HelperMethods.java:218)
at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:408)
at utilities.HelperMethods.main(HelperMethods.java:250)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
Caused by: java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at utilities.HelperMethods.convertImageToText(HelperMethods.java:218)
at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:408)
at utilities.HelperMethods.main(HelperMethods.java:250)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
All required dependencies like jai, lept4j etc are present in my repository. Also I have tried all the solutions suggested on this forum but I'm unable to resolve this error.
Any help would be appreciated.
Thanks
Update: Attaching the file here - Jpg file
It cannot determine an appropriate ImageReader for the given file format. So it's probably 1) the file format cannot be determined properly (weird file extension?) or 2) there is no image reader registered for the format you're trying use.
See ImageIO.getImageReaderByFormatName.