Text file encoding to UTF_8?

2019-09-12 06:39发布

问题:

I'm writing a Java project sending email with attachment.

In my test case, I add some Japanese words "some Hiraganas and Katakanas" to my attached testfile.txt (which I saved in UTF-8 encoding.)

But when I send my test email to myself, after I opened the attached testfile.txt, every Japanese Chars turns to be "????".

So I'm just wondering why this happens...?

Thank you

Allan

P.S. to be more specific, here is my code. I am using mail.jar to send email.

Here is how I get the file:

/**
 * Add an attachment to the Email.
 * @param filePath
 */
public void setFile(String filePath){

    attachment = new File(filePath);

}

and below is how I attach the file into my MIME email part.

/*Add attachment if an attachment is given.*/
    if(attachment != null){
    MimeBodyPart attachmentPart = new MimeBodyPart();
    attachmentPart.attachFile(attachment);
    multipart.addBodyPart(attachmentPart);
    }

回答1:

You need to ensure that you're reading and writing the file using the proper charset.

I.e. thus not so, which would use platform's default charset:

Reader reader = new FileReader("/testfile.txt");
// ...

But more so, using InputStreamReader wherein you explicitly specify the proper charset:

Reader reader = new InputStreamReader(new FileInputStream("/testfile.txt"), "UTF-8");
// ...

Also, in the Content-Type header of the email attachment you have to set the charset attribute and you have to write out the attachment using UTF-8. Further detail can't be given as it's unclear what mail API you're using. Alternatively, you can also stick to using InputStream/OutputStream only as that would stream the content as pure bytes and thus wouldn't affect the charset the bytes represent.


Update: you're using Javamail's MimeBodyPart without explicitly specifying the content type with the charset attribute. Now you're dependent on the mail client whether it treats the content as UTF-8 or not. Fix it as follows:

MimeBodyPart attachmentPart = new MimeBodyPart();
attachmentPart.attachFile(attachment);
attachmentPart.setHeader("Content-Type", "text/plain;charset=utf-8");
multipart.addBodyPart(attachmentPart);


回答2:

This thread seems to address setting the characters set correctly for mime body content (last comment).