Perl SMTP: can't send email with non-ascii cha

2019-05-31 02:42发布

Code, sending email (working good):

#!/usr/bin/perl

use utf8;
use strict;
use warnings;

use Email::Sender::Simple qw(sendmail);
use Email::Sender::Transport::SMTP ();
use Email::Simple ();
use open ':std', ':encoding(UTF-8)';

sub send_email
{
    my $email_from = shift;
    my $email_to = shift;
    my $subject = shift;
    my $message = shift;

    my $smtpserver = 'smtp.gmail.com';
    my $smtpport = 465;
    my $smtpuser   = 'user@gmail.com';
    my $password = 'secret';

    my $transport = Email::Sender::Transport::SMTP->new({
        host => $smtpserver,
        port => $smtpport,
        sasl_username => $email_from,
        sasl_password => $password,
        debug    => 1,
        ssl => 1,
    });

    my $email = Email::Simple->create(
        header => [
            To      => $email_to,
            From    => $email_from,
            Subject => $subject,
        ],
        body => $message,
    );

    $email->header_set( 'Content-Type' => 'text/html' );
    $email->header_set( 'charset' => 'UTF-8' );
    sendmail($email, { transport => $transport });
}

send_email('user@gmail.com', 'user@gmail.com', 'Hello', 'test email');

As soon as I add non-ascii characters to the body:

send_email('user@gmail.com', 'user@gmail.com', 'Hello', 'test email. Русский текст');

it hangs with the last message in debug output:

Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> charset: UTF-8
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> 
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> test email. Русский текст
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> .

How to fix?

1条回答
唯我独甜
2楼-- · 2019-05-31 03:15

TL;TR: the fix is simple but the problem itself is complex. To fix the issue add:

$email = Encode::encode('utf-8',$email->as_string)

before giving the mail to sendmail(...). But note the warning at the end of this answer about possible problems when sending 8 bit data like this inside a mail in the first place.


To actually understand the problem and the fix one has to look deeper into the handling of characters vs. octets in sockets in Perl:

  • Email::Sender::Transport::SMTP uses Net::SMTP which itself uses the syswrite method of the underlying IO::Socket::SSL or IO::Socket::IP (or IO::Socket::INET) socket, depending if SSL was used or not.
  • syswrite expects octets and it expects the number of octets written to the socket.
  • But, the mail you construct with Email::Simple returns not octets but a string with the UTF8 flag set. In this string the number of characters is different from the number of octets because the Russian текст is treated as 5 characters while it is 10 octets when converted with the UTF-8.
  • Email::Sender::Transport::SMTP just forwards the UTF8 string of the email to Net::SMTP which uses it inside a syswrite. The length is computed using length which gives the number of characters which is different from the number of octets in this case. But on the socket site it will take the octets and not the characters out of the string and will treat the given length as number of octets.
  • Because it will treat the given length as octets and not characters it will ultimately send less data to the server as expected by the upper layers of the program.
  • This way the end-of-mail marker (line with single dot) gets not send and thus the server is waiting for the client to send more data while the client is not aware of more data to send.

As an example take a mail which consists only of two Russian characters 'ий'. With line ends and the end-of-mail marker it consists of 7 characters:

ий\r\n.\r\n

But, these 7 characters are actually 9 octets because the first 2 characters are two octets each

и       й       \r \n   .   \r  \n
d0 b8   d0 b9   0d  0a  2e  0d  0a  

Now, a syswrite($fd,"ий\r\n.\r\n",7) will only write the first 7 octets of the 7 character but 9 octets long string:

и       й       \r \n   . 
d0 b8   d0 b9   0d  0a  2e

This means that the end-of-mail marker is incomplete. And this means that the mail server will wait for more data while the mail client is not aware of any more data it needs to sent. Which essentially causes the application to hang.

Now, who is too blame for this?

One could argue that IO::Socket::SSL::syswrite should deal with UTF8 data in a sane way and this what was requested but in RT#98732. But, the documentation for syswrite in IO::Socket::SSL clearly says that it works on bytes. And since it is practically impossible to create a sane character based behavior when considering non-blocking sockets this bug was rejected. Also non-SSL sockets will have problems with UTF8 strings too: if you would not use SSL in the first place the program would not hang but crash with Wide character in syswrite ... instead.

Next layer up would be to expect Net::SMTP to properly handle such UTF8 strings. Only, it is explicitly said in the documentation of Net::SMTP::data:

DATA may be a reference to a list or a list and must be encoded by the caller to octets of whatever encoding is required, e.g. by using the Encode module's encode() function.

Now one could argue that either Email::Transport should handle UTF8 strings properly or that Email::Simple::as_string should not return a UTF8 string in the first place.

But one could go even another layer up: to the developer itself. Mail is traditionally ASCII only and sending non-ASCII characters inside a mail is a bad idea since it works only reliably with mail servers having the 8BITMIME extension. If mail servers are involved which don't support this extension the results are unpredictable, i.e. mail can be transformed (which might break signatures), can be changed to be unreadable or could be lost somewhere. Thus better use a more complex module like Email::MIME and set an appropriate content transfer encoding.

查看更多
登录 后发表回答