Java PDFBox, extract data from a column of a table

2019-09-05 09:18发布

问题:

I would like to find out how to extract from this pdf(ex. image) http://postimg.org/image/ypebht5dx/

For example, I want to extract only the values ​​in the column "TENSIONE[V]" and if it encounters a blank cell I enter the letter "X" in the output. How could I do?

The code I used is this:

 PDDocument p=PDDocument.load(new File("a.pdf"));
 PDFTextStripper t=new PDFTextStripper();
 System.out.println(t.getText(p));

and I get this output:

http://s23.postimg.org/wbhcrw03v/Immagine.png

回答1:

These are just guidelines. Use them upon your use. This is not tested either, but help you solve your issue. If you have any question let me know.

String text = t.getText(p);
String lines[] = text.split("\\r?\\n"); // give you all the lines separated by new line

String cols[] = lines[0].split("\\s+") // gives array separated by whitespaces
// cols[0] contains pins
// clos[1] contains TENSIONE[V]
// cols[2] contains TOLLRENZA if not present then its empty


标签: java pdfbox