I have a text with quoted-printables. Here is an example of such a text (from a wikipedia article):
If you believe that truth=3Dbeauty, then surely=20=
mathematics is the most beautiful branch of philosophy.
I am looking for a Java class, which decode the encoded form to chars, e.g., =20 to a space.
UPDATE: Thanks to The Elite Gentleman, I know that I need to use QuotedPrintableCodec:
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.net.QuotedPrintableCodec;
import org.junit.Test;
public class QuotedPrintableCodecTest {
private static final String TXT = "If you believe that truth=3Dbeauty, then surely=20=mathematics is the most beautiful branch of philosophy.";
@Test
public void processSimpleText() throws DecoderException
{
QuotedPrintableCodec.decodeQuotedPrintable( TXT.getBytes() );
}
}
However I keep getting the following exception:
org.apache.commons.codec.DecoderException: Invalid URL encoding: not a valid digit (radix 16): 109
at org.apache.commons.codec.net.Utils.digit16(Utils.java:44)
at org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(QuotedPrintableCodec.java:186)
What am I doing wrong?
UPDATE 2: I have found this question @ SO and learn about MimeUtility:
import javax.mail.MessagingException;
import javax.mail.internet.MimeUtility;
public class QuotedPrintableCodecTest {
private static final String TXT = "If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.";
@Test
public void processSimpleText() throws MessagingException, IOException
{
InputStream is = new ByteArrayInputStream(TXT.getBytes());
BufferedReader br = new BufferedReader ( new InputStreamReader( MimeUtility.decode(is, "quoted-printable") ));
StringWriter writer = new StringWriter();
String line;
while( (line = br.readLine() ) != null )
{
writer.append(line);
}
System.out.println("INPUT: " + TXT);
System.out.println("OUTPUT: " + writer.toString() );
}
}
However the output still is not perfect, it contains '=' :
INPUT: If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.
OUTPUT: If you believe that truth=beauty, then surely = mathematics is the most beautiful branch of philosophy.
Now what am I doing wrong?
Apache Commons Codec QuotedPrintableCodec class does is the implementation of the RFC 1521 Quoted-Printable section.
Update, Your quoted-printable string is wrong, as the example on Wikipedia uses Soft-line breaks.
Soft-line breaks:
So your text should be made as follows:
The Javadoc clearly states:
And there is a bug logged for Apache QuotedPrintableCodec as it doesn't support the soft-line breaks.