-->

如何读取里面使用邮件javax.mail的正文(How to read text inside bo

2019-06-25 20:23发布

我正在开发使用javax.mail读信箱里的邮件客户端的邮件:

Properties properties = System.getProperties();  
properties.setProperty("mail.store.protocol", "imap");  
try {  
    Session session = Session.getDefaultInstance(properties, null);
    Store store = session.getStore("pop3");//create store instance  
    store.connect("pop3.domain.it", "mail.it", "*****");  
    Folder inbox = store.getFolder("inbox");  
    FlagTerm ft = new FlagTerm(new Flags(Flags.Flag.SEEN), false);
    inbox.open(Folder.READ_ONLY);//set access type of Inbox  
    Message messages[] = inbox.search(ft);
    String mail,sub,bodyText="";
    Object body;
    for(Message message:messages) {
        mail = message.getFrom()[0].toString();
        sub = message.getSubject();
        body = message.getContent();
        //bodyText = body.....
    }
} catch (Exception e) {  
    System.out.println(e);    
}

我知道,该方法getContent()返回一个对象使得内容可以是一个String ,一个MimeMultiPart ,一SharedByteArrayInputstream等(我认为)...有没有办法让里面总是邮件的正文中的文本? 谢谢!!

Answer 1:

这个答案扩展yurin的答案 。 他带出来的问题是一个内容MimeMultipart本身可能是另一个MimeMultipart 。 所述getTextFromMimeMultipart()以下方法递归中的内容这样的情况下,直到该消息体已经被完全处理。

private String getTextFromMessage(Message message) throws MessagingException, IOException {
    String result = "";
    if (message.isMimeType("text/plain")) {
        result = message.getContent().toString();
    } else if (message.isMimeType("multipart/*")) {
        MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
        result = getTextFromMimeMultipart(mimeMultipart);
    }
    return result;
}

private String getTextFromMimeMultipart(
        MimeMultipart mimeMultipart)  throws MessagingException, IOException{
    String result = "";
    int count = mimeMultipart.getCount();
    for (int i = 0; i < count; i++) {
        BodyPart bodyPart = mimeMultipart.getBodyPart(i);
        if (bodyPart.isMimeType("text/plain")) {
            result = result + "\n" + bodyPart.getContent();
            break; // without break same text appears twice in my tests
        } else if (bodyPart.isMimeType("text/html")) {
            String html = (String) bodyPart.getContent();
            result = result + "\n" + org.jsoup.Jsoup.parse(html).text();
        } else if (bodyPart.getContent() instanceof MimeMultipart){
            result = result + getTextFromMimeMultipart((MimeMultipart)bodyPart.getContent());
        }
    }
    return result;
}


Answer 2:

这个答案扩展奥斯汀的答案以解决处理的原单问题multipart/alternative// without break same text appears twice in my tests )。

该文本,因为出现了两次multipart/alternative ,用户代理,预计选择只是其中的一部分。

从RFC2046 :

在“多部分/替代”型是语法上等同于“多部分/混合的”,但语义是不同的。 特别地,每一个身体部位是相同的信息的“替代”的版本。

系统应该认识到,各部分的内容是可以互换的。 系统应该选择基于即使通过用户交互当地环境和引用,在某些情况下,“最好”的类型。 与“多部分/混合的”,身体部位的顺序是显著。 在这种情况下,替代品出现在越来越忠实于原始内容的顺序。 在一般情况下,最好的选择是由接收方系统的本地环境支持的类型的最后一部分。

同样的例子与治疗方案:

private String getTextFromMessage(Message message) throws IOException, MessagingException {
    String result = "";
    if (message.isMimeType("text/plain")) {
        result = message.getContent().toString();
    } else if (message.isMimeType("multipart/*")) {
        MimeMultipart mimeMultipart = (MimeMultipart) message.getContent();
        result = getTextFromMimeMultipart(mimeMultipart);
    }
    return result;
}

private String getTextFromMimeMultipart(
        MimeMultipart mimeMultipart) throws IOException, MessagingException {

    int count = mimeMultipart.getCount();
    if (count == 0)
        throw new MessagingException("Multipart with no body parts not supported.");
    boolean multipartAlt = new ContentType(mimeMultipart.getContentType()).match("multipart/alternative");
    if (multipartAlt)
        // alternatives appear in an order of increasing 
        // faithfulness to the original content. Customize as req'd.
        return getTextFromBodyPart(mimeMultipart.getBodyPart(count - 1));
    String result = "";
    for (int i = 0; i < count; i++) {
        BodyPart bodyPart = mimeMultipart.getBodyPart(i);
        result += getTextFromBodyPart(bodyPart);
    }
    return result;
}

private String getTextFromBodyPart(
        BodyPart bodyPart) throws IOException, MessagingException {

    String result = "";
    if (bodyPart.isMimeType("text/plain")) {
        result = (String) bodyPart.getContent();
    } else if (bodyPart.isMimeType("text/html")) {
        String html = (String) bodyPart.getContent();
        result = org.jsoup.Jsoup.parse(html).text();
    } else if (bodyPart.getContent() instanceof MimeMultipart){
        result = getTextFromMimeMultipart((MimeMultipart)bodyPart.getContent());
    }
    return result;
}

请注意,这是一个很简单的例子。 它错过许多情况下,不应在生产它的当前形式使用。



Answer 3:

下面是方法将需要从文本消息的情况下,正文部分是文本和HTML。

  import javax.mail.BodyPart;
  import javax.mail.Message;
  import javax.mail.internet.MimeMultipart;
  import org.jsoup.Jsoup;

  ....    
  private String getTextFromMessage(Message message) throws Exception {
    if (message.isMimeType("text/plain")){
        return message.getContent().toString();
    }else if (message.isMimeType("multipart/*")) {
        String result = "";
        MimeMultipart mimeMultipart = (MimeMultipart)message.getContent();
        int count = mimeMultipart.getCount();
        for (int i = 0; i < count; i ++){
            BodyPart bodyPart = mimeMultipart.getBodyPart(i);
            if (bodyPart.isMimeType("text/plain")){
                result = result + "\n" + bodyPart.getContent();
                break;  //without break same text appears twice in my tests
            } else if (bodyPart.isMimeType("text/html")){
                String html = (String) bodyPart.getContent();
                result = result + "\n" + Jsoup.parse(html).text();

            }
        }
        return result;
    }
    return "";
}

更新 。 有一种情况下,正文部分本身的类型可以是多的。 (我遇到过这样的电子邮件后,写了这个答案。)在这种情况下,你需要重写上述方法用递归。



Answer 4:

I don't think so, otherwise what would happen if a Part's mime type is image/jpeg? The API returns an Object because internally it tries to give you something useful, provided you know what is expected to be. For general purpose software, it's intended to be used like this:

if (part.isMimeType("text/plain")) {
   ...
} else if (part.isMimeType("multipart/*")) {
   ...
} else if (part.isMimeType("message/rfc822")) {
   ...
} else {
   ...
}

You also have the raw (actually not so raw, see the Javadoc) Part.getInputStream(), but I think it's unsafe to assume that each and every message you receive is a text-based one - unless you are writing a very specific application and you have control over the input source.



Answer 5:

如果你想获得文本始终那么你可以跳过其他类型如“多”等等......

  Object body = message.getContent(); 
    if(body instanceof String){
    // hey it's a text
    }


Answer 6:

不要重新发明轮子! 你可以简单地使用Apache通用电子邮件(见这里 )

科特林例如:

fun readHtmlContent(message: MimeMessage) = 
        MimeMessageParser(message).parse().htmlContent

如果邮件不具有HTML内容,但它具有简单的内容(您可以检查通过hasPlainContent和hasHtmlContent方法),那么你应该使用此代码:

fun readPlainContent(message: MimeMessage) = 
        MimeMessageParser(message).parse().plainContent

Java示例:

String readHtmlContent(MimeMessage message) throws Exception {
    return new MimeMessageParser(message).parse().getHtmlContent();
}

String readPlainContent(MimeMessage message) throws Exception {
    return new MimeMessageParser(message).parse().getPlainContent();
}


文章来源: How to read text inside body of mail using javax.mail