How to read UTF-8 mail attachments from pop3

2019-08-17 19:49发布

问题:

E-mail contains XML file attachment in UTF-8 encoding. I'm looking for a way to read this attahcmed from ASP.NET/Mono MVC4 application. I tried to use openpop as described in How to save email attachment using OpenPop using code

using (OpenPop.Pop3.Pop3Client client = new Pop3Client())
{
    client.Connect("mail.company.com", 110, false);
    client.Authenticate("user", "pass", AuthenticationMethod.UsernameAndPassword);
    if (client.Connected)
    {
        int messageCount = client.GetMessageCount();
        List<Message> allMessages = new List<Message>(messageCount);
        for (int i = messageCount; i > 0; i--)
        {
            var msg = client.GetMessage(i);
            var att = msg.FindAllAttachments(); 
            foreach (var ado in att)
            {
                var xml = ado.GetBodyAsText();

In result xml string accented characters are converted to tow ?? marks. XXXLTEC O=C3=9C in message below appears as XXXLTEC O?? in xml variable. Correct result is XXXLTEC OÜ

How to read UTF-8 attachment properly ? I havent found any option in OpenPop to convert it correctly.

XML Attachment in message appears as

------=_NextPart_000_0066_01D0302C.83D6EFA0
Content-Type: text/xml;
    name="tapitolemas.xml"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
    filename="tapitolemas.xml"

<?xml version=3D"1.0" encoding=3D"UTF-8"?>
<E-Document>
  <Header>
    <DateIssued>2015-01-02T13:27</DateIssued>
    <SenderID>-</SenderID>
    <ReceiverID>1COL</ReceiverID>
  </Header>
  <Document>
    <DocumentType>invoice</DocumentType>
    <DocumentFunction>original</DocumentFunction>
    <DocumentParties>
      <BuyerParty context=3D"partner">
        <PartyCode>1COL</PartyCode>
        <Name>XXXLTEC O=C3=9C</Name>

回答1:

There's probably no way to work around this using OpenPOP.NET, so your only real choice to get this to work is to use another POP3 library such as MailKit which does not have this problem.

The problem is that OpenPOP assumes that the charset is US-ASCII because there is no charset parameter in the Content-Type header and it wrongly forces text to convert using that charset encoding (instead of being liberal in what it accepts).

MailKit, on the other hand, uses charset fallback logic to try and determine which charset it is. But even if it gets it wrong by default (i.e. the TextPart.Text property, you can still use TextPart.GetText (System.Text.Encoding encoding) to override things).



回答2:

This can fixed by changing in MessagePrt.cs GetBodyAsText() method to

    public string GetBodyAsText()
    {
        return Encoding.UTF8.GetString(Body);
        // Original gets ?? characters instead of unicode ones
        //return BodyEncoding.GetString(Body);
    }