The code below extracts Hebrew text from without the Hebrew character "ן". All other text seems to be extracted fine. Any ideas?
public class TestPDFUtil {
public void testHebrewPDF() throws Exception {
String url = "";
String text = PDFUtil.readPDF(url);
Assert.assertTrue(text.indexOf("זיכרון עבודה") != -1);
public class PDFUtil {
public static String readPDF(String url) throws IOException {
URL urlObj = new URL(url);
PDDocument document = PDDocument.load(urlObj.openStream());
if( !document.isEncrypted() ){
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);
return text.trim();
return null;
Attaching screen shots that show the missing character. On the left is how the page appears in Crome. On the right is the result of PDF text extraction using the code above.