I'm using Apache PDFBox to extract text from several PDF files. The files are in Polish language and they contain Polish characters. Unfortunately, when I print the extracted text, I keep getting ? (question marks) instead of those characters.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
- Difference between Types.INTEGER and Types.NULL in
Assuming your extracted text is stored in String s, I am assuming that you are currently using this to print -
I suggest you use this snippet for printing out the polish characters properly-
This should work and ? will not appear in the printed text.