可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Can someone confirm that Python 2.6 ftplib does NOT support Unicode file names? Or must Unicode file names be specially encoded in order to be used with the ftplib module?
The following email exchange seems to support my conclusion that the ftplib module only supports ASCII file names.
Should ftplib use UTF-8 instead of latin-1 encoding?
http://mail.python.org/pipermail/python-dev/2009-January/085408.html
Any recommendations on a 3rd party Python FTP module that supports Unicode file names? I've googled this question without success [1], [2].
The official Python documentation does not mention Unicode file names [3].
Thank you,
Malcolm
[1] ftputil wraps ftplib and inherits ftplib's apparent ASCII only support?
[2] Paramiko's SFTP library does support Unicode file names, however I'm looking specifically for ftp (vs. sftp) support relative to our current project.
[3] http://docs.python.org/library/ftplib.html
WORKAROUND:
The encodings.idna.ToASCII and .ToUnicode methods can be used to convert Unicode path names to an ASCII format. If you wrap all your remote path names and the output of the dir/nlst methods with these functions, then you can create a way to preserve Unicode path names using the standard ftplib (and also preserve Unicode file names on file systems that don't support Unicode paths). The downside to this technique is that other processes on the server will also have to use encodings.idna when referencing the files that you upload to the server. BTW: I understand that this is an abuse of the encodings.idna library.
Thank you Peter and Bob for your comments which I found very helpful.
回答1:
ftplib
has no knowledge of Unicode whatsoever. It is intended to be passed byte-strings for filenames, and it'll return byte strings when asked for a directory list. Those are the exact strings of bytes passed-to/returned-from the server.
If you pass a Unicode string to ftplib
in Python 2.x, it'll end up getting coerced to bytes when it's sent to the underlying socket object. This coercion uses Python's ‘default’ encoding, ie. US-ASCII for safety, with exceptions generated for non-ASCII characters.
The python-dev message to which you linked is talking about ftplib
in Python 3.x, where strings are Unicode by default. This leaves modules like ftplib
in a tricky situation because although they now use Unicode strings at their front-end, the actual protocol behind it is byte-based. There therefore has to be an extra level of encoding/decoding involved, and without explicit intervention to specify what encoding is in use, there's a fair change it'll choose wrong.
ftplib
in 3.x chose to default to ISO-8859-1 in order to preserve each byte as a character inside the Unicode string. Unfortunately this will give unexpected results in the common case where the target server uses a UTF-8 collation for filenames (whether or not the FTP daemon itself knows that filenames are UTF-8, which it commonly won't). There are a number of cases like this where the Python standard libraries have been brutally hacked to Unicode strings with negative consequences; Python 3's batteries-included are still leaking corrosive fluid IMO.
回答2:
Personally I would be more worried about what is on the other side of the ftp connection than the support of the library. FTP is a brittle protocol at the best of times without trying to be creative with filenames.
from RFC 959:
Pathname is defined to be the character string which must be
input to a file system by a user in order to identify a file.
Pathname normally contains device and/or directory names, and
file name specification. FTP does not yet specify a standard
pathname convention. Each user must follow the file naming
conventions of the file systems involved in the transfer.
To me that means that the filenames should conform to the lowest common denominator. Since nowadays the number of DOS servers, Vax and IBM mainframes is negligeable and chances are you'll end up on a Windows or Unix box so the common denominator is quite high, but making assumptions on which codepage the remote site wants to accept appears to me pretty risky.
回答3:
To get around this, I used the following code
ftp.storbinary("STOR " + target_name.encode( "utf-8" ), open(file_name, 'rb'))
This assumes that the FTP server supports RFC 2640 http://www.ietf.org/rfc/rfc2640.txt which allows for utf-8 file names. In my case I used SwiFTP server for Android and it transfers the files with the proper names successfully.
回答4:
Can someone confirm that Python 2.6 ftplib does NOT support Unicode file names?
It doesn't.
Should ftplib use UTF-8 instead of latin-1 encoding?
It's debatable. UTF-8 is the preferred encoding as dictated by RFC-2640 but latin-1 is usually more friendly for misbehaving implementations (either server or client).
If server includes "UTF8" as part of the FEAT response then you should definitively use UTF8.
>>> utf8_server = 'UTF8' in ftp.sendcmd('FEAT')
To support unicode in python 2.x you can adopt the following monkey patched version of ftpdlib:
class UnicodeFTP(ftplib.FTP):
"""A ftplib.FTP subclass supporting unicode file names as
described by RFC-2640."""
def putline(self, line):
line = line + '\r\n'
if isinstance(line, unicode):
line = line.encode('utf8')
self.sock.sendall(line)
...and pass unicode strings when using the remaining API as in:
>>> ftp = UnicodeFTP(host='ftp.site.com', user='foo', passwd='bar')
>>> ftp.delete(u'somefile')
回答5:
We got UTF8 encoded filenames working for Python 2.7's FTPlib.
Note 1: Here's a background to easily explain UTF8 and unicode:
https://code.google.com/p/iqbox-ftp/wiki/ProgrammingGuide_UnicodeVsAscii
Note 2: You can take a look at the AGPL libraries we use for IQBox. You might be able to use those (or parts of those), and they support UTF8 over FTP. Look at filetransfer_abc.py
You do need to add code to (1) Determine if the server supports UTF8, and (2) encode the unicode Python string in UTF8 format. (3) (Full code not shown since everyone gets file listings differently) When you get the file listings you need to also use if UTF8_support: name = name.decode('utf-8')
# PART (1): DETERMINE IF SERVER HAS UTF8 SUPPORT:
# Get FTP features:
try:
features_string_ftp = ftp.sendcmd('FEAT')
print features_string_ftp
# Determine UTF8 support:
if 'UTF8' in features_string_ftp.upper():
print "FTP>> Server supports international characters (UTF8)"
UTF8_support = True
else:
print "FTP>> Server does NOT support international (non-ASCII) characters."
UTF8_support = False
except:
print "FTP>> Could not get list of features using FEAT command."
print "FTP>> Server does NOT support international (non-ASCII) characters."
UTF8_support = False
# Part (2): Encode FTP commands needed to be sent using UTF8 encoding, if it's supported.
def sendFTPcommand(ftp, command_string, UTF8_support):
# Needed for UTF8 international file names etc.
c = None
if UTF8_support:
c = command_string.encode('utf-8')
else:
c = command_string
# TODO: Add try-catch here and connection error retries.
return ftp.sendcmd(c)
# If you just want to get a string with the UTF8 command and send it yourself, then use this:
def encodeFTPcommand(self, command_string. UTF8_support):
# Needed for UTF8 international file names etc.
c = None
if UTF8_support:
c = command_string.encode('utf-8')
else:
c = command_string
return c