-I am using the IText plugin to create/read pdfs on my java project. -I am reading multiple text files from any extension(pdf,doc,word etc) and writing their content on a new pdf(all the content of all the files joint together) -To separate each content of each file on the giant pdf, i am always starting a new page, writing the exact path to the file in red at the start of the new page and then writing the content of the file
The problem:
- I want to write how many pages did the file have on this pdf
- How do i check if a string is present on the pdf page? I have all the files paths, so i would like to check if any of the paths is written on the page
- I was following this tutorial to extract the string of any of my pages: http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/
But when i extract all the page and check if one if my file paths is present at the page(doing a string.contains(...)), the system doesn't find my file path on the pdf page! I have checked why this happens and when i outputted one page's string, it was like this:
1. PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/ src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java package br.ufrn.pairg.pdfgenerator;
import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.util.Scanner;
public...
When i checked to see if the file path "PdfGeneratorForSoftwareRegistration/PdfGeneratorForSoftwareRegistration/ src/br/ufrn/pairg/pdfgenerator/LeitorArquivoTexto.java" was present at this giant string, the system didn't find it. Can you see the problem? My path is so big that occupies 2 lines! That's the problem!
So, my question is: is there a way to check if a giant string is present on a pdf text using itext plugin?
It´s not the best sollution for it, but i solved it by writing an miraculous id(like "#%&#id_0#%&#") on top of every path name on my first pdf. Then, i read the pdf once again and check if there's the id. If there is, i associate it with my file paths.
Problem solved: i am getting the page numbers using the solution of http://www.quicklyjava.com/read-pdf-file-in-java-using-itext/
Problem: If there is any file in the project with #%&#id_0#%&#,#%&#id_1#%&#... written on it, my program will not work.
Pages in a PDF file are organized using a page tree. Each leaf of the page tree is a page dictionary with keys and values. You could add a custom entry to the page dictionary like this:
If you look inside the PDF, this looks like this:
For the sake of this example, I added a PDF string for page 4, a PDF name for page 6 and a PDF number for page 7.
You can check for the presence of this custom key like this:
The output of this
check()
is like this:Important: You can't just invent new keys for the PDF syntax apart from those defined in ISO 32000. However, you can create your own custom keys if you register a 4 digit code with ISO. For instance: Adobe registered ADBE, iText registered ITXT,... If you introduce new custom keys, you should use the code registered with ISO as a prefix. For instance: at iText, we can use
ITXT_PageMarker
, orITXT_custom
, orITXT_Whatever
,... This rule avoids that two different company introduce the same code with a different meaning.