PDFTextStripper NullPointerException

2019-08-30 04:23发布

问题:

I am trying to fetch some data from a PDF file in Java using apache PDFBox(1.8.9). I have added the jar in my buildpath and classpath (in Eclipse-Mars)

I am getting a null pointer exception while creating a PDFTextStripper object.

import java.io.File;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.pdfbox.pdmodel.PDDocument;

public class MainClass {

    public static void main(String[] args) {
        PDDocument pd ;

        try{

          StringBuilder sb = new StringBuilder();       

          File input = new File("C:\\Result.pdf");
          pd = PDDocument.load(input);

          PDFTextStripper s = new PDFTextStripper();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
    }

}

The error I am getting is :

java.lang.NullPointerException
at org.apache.pdfbox.util.TextNormalize.findICU4J(TextNormalize.java:54)
at org.apache.pdfbox.util.TextNormalize.<init>(TextNormalize.java:45)
at org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:229)
at MainClass.main(MainClass.java:17)

(Line 17 is where I am trying to create a PDFTextStripper object)

回答1:

Checking the source of TextStripper class, it appears that a class not found exception is made to return as null.

You need ICU4J jar as your dependency. These classes is loaded at run time.

From TextStripper

 // see if we can load the icu4j classes from the classpath
        try 
        {
            this.getClass().getClassLoader().loadClass("com.ibm.icu.text.Bidi");
            this.getClass().getClassLoader().loadClass("com.ibm.icu.text.Normalizer");
            icu4j = new ICU4JImpl();
        } 
        catch (ClassNotFoundException e) 
        {
            icu4j = null;
        }


回答2:

You are missing some dependency, please ensure below three jars are present in your classpath:-

I executed the code mentioned in your question with the above three jars, didn't receive any NPE.

Also kindly check your pdfbox-1.8.9.jar, ensure that its not corrupted.
TextStripper class is present in pdfbox-1.8.9.jar, so It looks to me that this jar is corrupted.
Download the jar again and try.