-I am using the IText plugin to create/read pdfs on my java project.
-I am reading multiple text files from any extension(pdf,doc,word etc) and writing their content on a new pdf(all the content of all the files joint together)
-To separate each content of each file on the giant pdf, i am always starting a new page, writing the exact path to the file in red at the start of the new page and then writing the content of the file
The problem:
- I want to write how many pages did the file have on this pdf
- How do i check if a string is present on the pdf page? I have all the files paths, so i would like to check if any of the paths is written on the page
- I was following this tutorial to extract the string of any of my pages: http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/
But when i extract all the page and check if one if my file paths is present at the page(doing a string.contains(...)), the system doesn't find my file path on the pdf page! I have checked why this happens and when i outputted one page's string, it was like this:
1.
PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/
src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java
package br.ufrn.pairg.pdfgenerator;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;
public...
When i checked to see if the file path "PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/
src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java" was present at this giant string, the system didn't find it. Can you see the problem? My path is so big that occupies 2 lines! That's the problem!
So, my question is: is there a way to check if a giant string is present on a pdf text using itext plugin?
Pages in a PDF file are organized using a page tree. Each leaf of the page tree is a page dictionary with keys and values. You could add a custom entry to the page dictionary like this:
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
document.add(new Paragraph("Page 1"));
document.newPage();
document.add(new Paragraph("Page 2"));
document.newPage();
document.add(new Paragraph("Page 3"));
document.newPage();
document.add(new Paragraph("Page 4"));
writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfString("Marker for page 4"));
document.newPage();
document.add(new Paragraph("Page 5"));
document.newPage();
document.add(new Paragraph("Page 6"));
writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfName("PageMarker"));
document.newPage();
document.add(new Paragraph("Page 7"));
writer.addPageDictEntry(new PdfName("ITXT_PageMarker"), new PdfNumber(7));
document.newPage();
document.add(new Paragraph("Page 8"));
document.close();
}
If you look inside the PDF, this looks like this:
For the sake of this example, I added a PDF string for page 4, a PDF name for page 6 and a PDF number for page 7.
You can check for the presence of this custom key like this:
public void check(String filename) throws IOException {
PdfReader reader = new PdfReader(filename);
PdfDictionary pagedict;
for (int i = 1; i < reader.getNumberOfPages(); i++) {
pagedict = reader.getPageN(i);
System.out.println(pagedict.get(new PdfName("ITXT_PageMarker")));
}
reader.close();
}
The output of this check()
is like this:
null
null
null
Marker for page 4
null
/PageMarker
7
Important: You can't just invent new keys for the PDF syntax apart from those defined in ISO 32000. However, you can create your own custom keys if you register a 4 digit code with ISO. For instance: Adobe registered ADBE, iText registered ITXT,... If you introduce new custom keys, you should use the code registered with ISO as a prefix. For instance: at iText, we can use ITXT_PageMarker
, or ITXT_custom
, or ITXT_Whatever
,... This rule avoids that two different company introduce the same code with a different meaning.
It´s not the best sollution for it, but i solved it by writing an miraculous id(like "#%&#id_0#%&#") on top of every path name on my first pdf. Then, i read the pdf once again and check if there's the id. If there is, i associate it with my file paths.
Problem solved: i am getting the page numbers using the solution of http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/
Problem: If there is any file in the project with #%&#id_0#%&#,#%&#id_1#%&#... written on it, my program will not work.