I'm trying to extract and print english text out of a pdf on console. Extraction is done through itextpdf API using PdfTextExtractor class. Text i'm getting is not understandble. May be some language issues I'm facing. My intent is to find a particular text within a PDF and replace it with some other string. I started with parsing the file to find the string. Following code snippet represents my string extractor:
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream(OUTPUTFILE));
document.open();
PdfReader reader = new PdfReader(input);
int n = reader.getNumberOfPages();
PdfImportedPage page;
// Go through all pages
for (int i = 1; i <= n; i++) {
String str=PdfTextExtractor.getTextFromPage(reader, i);
System.out.println(str);
}
document.close();
but the output I'm getting on console is not understandable even though the text in the PDF is in english.
Output:
t cotenn dna o mntoafinir yales r ni et h layhcsip Amgteu end y Retila m eysts w tih eth erss p wlli e erefcern emsyst o f et h se. ru I n tioi, dnda etseh orpvedi eddda e ulav o t taw h s i oelbssip hwti se vdcie ollaw na s tiouquibu cacess o t latoutenxc e rpap dna t ilagid ottennc olae n ewnh ey th krwo tofoi. nmirna ni soitaoli n mor f chea e. roth s iTh s i a cel ra csea ewerh " eth lweoh is ermo nath eth ms u fo sti
rtasp ".
Can anybody please help me out what could be the possible solution for bringing text in english language as it is like in source PDF. Any sort of help will be highly appreciated.