Code, sending email (working good):
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Email::Sender::Simple qw(sendmail);
use Email::Sender::Transport::SMTP ();
use Email::Simple ();
use open ':std', ':encoding(UTF-8)';
sub send_email
{
my $email_from = shift;
my $email_to = shift;
my $subject = shift;
my $message = shift;
my $smtpserver = 'smtp.gmail.com';
my $smtpport = 465;
my $smtpuser = 'user@gmail.com';
my $password = 'secret';
my $transport = Email::Sender::Transport::SMTP->new({
host => $smtpserver,
port => $smtpport,
sasl_username => $email_from,
sasl_password => $password,
debug => 1,
ssl => 1,
});
my $email = Email::Simple->create(
header => [
To => $email_to,
From => $email_from,
Subject => $subject,
],
body => $message,
);
$email->header_set( 'Content-Type' => 'text/html' );
$email->header_set( 'charset' => 'UTF-8' );
sendmail($email, { transport => $transport });
}
send_email('user@gmail.com', 'user@gmail.com', 'Hello', 'test email');
As soon as I add non-ascii characters to the body:
send_email('user@gmail.com', 'user@gmail.com', 'Hello', 'test email. Русский текст');
it hangs with the last message in debug output:
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> charset: UTF-8
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>>
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> test email. Русский текст
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> .
How to fix?
TL;TR: the fix is simple but the problem itself is complex. To fix the issue add:
before giving the mail to
sendmail(...)
. But note the warning at the end of this answer about possible problems when sending 8 bit data like this inside a mail in the first place.To actually understand the problem and the fix one has to look deeper into the handling of characters vs. octets in sockets in Perl:
Email::Sender::Transport::SMTP
usesNet::SMTP
which itself uses thesyswrite
method of the underlyingIO::Socket::SSL
orIO::Socket::IP
(orIO::Socket::INET
) socket, depending if SSL was used or not.syswrite
expects octets and it expects the number of octets written to the socket.Email::Simple
returns not octets but a string with the UTF8 flag set. In this string the number of characters is different from the number of octets because the Russianтекст
is treated as 5 characters while it is 10 octets when converted with the UTF-8.Email::Sender::Transport::SMTP
just forwards the UTF8 string of the email toNet::SMTP
which uses it inside asyswrite
. The length is computed usinglength
which gives the number of characters which is different from the number of octets in this case. But on the socket site it will take the octets and not the characters out of the string and will treat the given length as number of octets.As an example take a mail which consists only of two Russian characters 'ий'. With line ends and the end-of-mail marker it consists of 7 characters:
But, these 7 characters are actually 9 octets because the first 2 characters are two octets each
Now, a
syswrite($fd,"ий\r\n.\r\n",7)
will only write the first 7 octets of the 7 character but 9 octets long string:This means that the end-of-mail marker is incomplete. And this means that the mail server will wait for more data while the mail client is not aware of any more data it needs to sent. Which essentially causes the application to hang.
Now, who is too blame for this?
One could argue that IO::Socket::SSL::syswrite should deal with UTF8 data in a sane way and this what was requested but in RT#98732. But, the documentation for
syswrite
in IO::Socket::SSL clearly says that it works on bytes. And since it is practically impossible to create a sane character based behavior when considering non-blocking sockets this bug was rejected. Also non-SSL sockets will have problems with UTF8 strings too: if you would not use SSL in the first place the program would not hang but crash withWide character in syswrite ...
instead.Next layer up would be to expect
Net::SMTP
to properly handle such UTF8 strings. Only, it is explicitly said in the documentation of Net::SMTP::data:Now one could argue that either
Email::Transport
should handle UTF8 strings properly or thatEmail::Simple::as_string
should not return a UTF8 string in the first place.But one could go even another layer up: to the developer itself. Mail is traditionally ASCII only and sending non-ASCII characters inside a mail is a bad idea since it works only reliably with mail servers having the 8BITMIME extension. If mail servers are involved which don't support this extension the results are unpredictable, i.e. mail can be transformed (which might break signatures), can be changed to be unreadable or could be lost somewhere. Thus better use a more complex module like
Email::MIME
and set an appropriate content transfer encoding.