I'm trying to convert an image file to text using tess4j maven dependency.
Dependency in pom.xml:-
<!-- OCR dependency -->
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.4.0</version>
<exclusions>
<exclusion>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
</exclusion>
<exclusion>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>4.4.0</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.5.0</version>
</dependency>
My code:-
public String convertImageToText(String imageFilePath) throws TesseractException {
File imageFile = new File("imageFilePath");
ITesseract iTesseract = new Tesseract();
ImageIO.scanForPlugins();
String result = iTesseract.doOCR(imageFile);
System.out.println("Converted text is: "+result);
return result;
}
However, when I try executing my program, I always encounter below exception:
Exception in thread "main" net.sourceforge.tess4j.TesseractException: java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215)
at utilities.HelperMethods.convertImageToText(HelperMethods.java:218)
at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:408)
at utilities.HelperMethods.main(HelperMethods.java:250)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
Caused by: java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at utilities.HelperMethods.convertImageToText(HelperMethods.java:218)
at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:408)
at utilities.HelperMethods.main(HelperMethods.java:250)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
All required dependencies like jai, lept4j etc are present in my repository. Also I have tried all the solutions suggested on this forum but I'm unable to resolve this error.
Any help would be appreciated.
Thanks
Update: Attaching the file here - Jpg file