Proxies in Python FTP application

2020-02-09 05:14发布

问题:

I'm developing an FTP client in Python ftplib. How do I add proxies support to it (most FTP apps I have seen seem to have it)? I'm especially thinking about SOCKS proxies, but also other types... FTP, HTTP (is it even possible to use HTTP proxies with FTP program?)

Any ideas how to do it?

回答1:

As per this source.

Depends on the proxy, but a common method is to ftp to the proxy, then use the username and password for the destination server.

E.g. for ftp.example.com:

Server address: proxyserver (or open proxyserver from with ftp)
User:           anonymous@ftp.example.com
Password:       password

In Python code:

from ftplib import FTP
site = FTP('my_proxy')
site.set_debuglevel(1)
msg = site.login('anonymous@ftp.example.com', 'password')
site.cwd('/pub')


回答2:

You can use the ProxyHandler in urllib2.

ph = urllib2.ProxyHandler( { 'ftp' : proxy_server_url } )
server= urllib2.build_opener( ph )


回答3:

I had the same problem and needed to use the ftplib module (not to rewrite all my scripts with URLlib2).

I have managed to write a script that installs transparent HTTP tunneling on the socket layer (used by ftplib).

Now, I can do FTP over HTTP transparently !

You can get it there: http://code.activestate.com/recipes/577643-transparent-http-tunnel-for-python-sockets-to-be-u/



回答4:

Standard module ftplib doesn't support proxies. It seems the only solution is to write your own customized version of the ftplib.



回答5:

Patching the builtin socket library definitely won't be an option for everyone, but my solution was to patch socket.create_connection() to use an HTTP proxy when the hostname matches a whitelist:

from base64 import b64encode
from functools import wraps
import socket

_real_create_connection = socket.create_connection
_proxied_hostnames = {}  # hostname: (proxy_host, proxy_port, proxy_auth)


def register_proxy (host, proxy_host, proxy_port, proxy_username=None, proxy_password=None):
    proxy_auth = None
    if proxy_username is not None or proxy_password is not None:
        proxy_auth = b64encode('{}:{}'.format(proxy_username or '', proxy_password or ''))
    _proxied_hostnames[host] = (proxy_host, proxy_port, proxy_auth)


@wraps(_real_create_connection)
def create_connection (address, *args, **kwds):
    host, port = address
    if host not in _proxied_hostnames:
        return _real_create_connection(address, *args, **kwds)

    proxy_host, proxy_port, proxy_auth = _proxied_hostnames[host]
    conn = _real_create_connection((proxy_host, proxy_port), *args, **kwds)
    try:
        conn.send('CONNECT {host}:{port} HTTP/1.1\r\nHost: {host}:{port}\r\n{auth_header}\r\n'.format(
            host=host, port=port,
            auth_header=('Proxy-Authorization: basic {}\r\n'.format(proxy_auth) if proxy_auth else '')
        ))
        response = ''
        while not response.endswith('\r\n\r\n'):
            response += conn.recv(4096)
        if response.split()[1] != '200':
            raise socket.error('CONNECT failed: {}'.format(response.strip()))
    except socket.error:
        conn.close()
        raise

    return conn


socket.create_connection = create_connection

I also had to create a subclass of ftplib.FTP that ignores the host returned by PASV and EPSV FTP commands. Example usage:

from ftplib import FTP
import paramiko  # For SFTP
from proxied_socket import register_proxy

class FTPIgnoreHost (FTP):
    def makepasv (self):
        # Ignore the host returned by PASV or EPSV commands (only use the port).
        return self.host, FTP.makepasv(self)[1]

register_proxy('ftp.example.com', 'proxy.example.com', 3128, 'proxy_username', 'proxy_password')

ftp_connection = FTP('ftp.example.com', 'ftp_username', 'ftp_password')

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())  # If you don't care about security.
ssh.connect('ftp.example.com', username='sftp_username', password='sftp_password')
sftp_connection = ssh.open_sftp()


回答6:

Here is workaround using requests, tested with a squid proxy that does NOT support CONNECT tunneling:

def ftp_fetch_file_through_http_proxy(host, user, password, remote_filepath, http_proxy, output_filepath):
    """
    This function let us to make a FTP RETR query through a HTTP proxy that does NOT support CONNECT tunneling.
    It is equivalent to: curl -x $HTTP_PROXY --user $USER:$PASSWORD ftp://$FTP_HOST/path/to/file
    It returns the 'Last-Modified' HTTP header value from the response.

    More precisely, this function sends the following HTTP request to $HTTP_PROXY:
        GET ftp://$USER:$PASSWORD@$FTP_HOST/path/to/file HTTP/1.1
    Note that in doing so, the host in the request line does NOT match the host we send this packet to.

    Python `requests` lib does not let us easily "cheat" like this.
    In order to achieve what we want, we need:
    - to mock urllib3.poolmanager.parse_url so that it returns a (host,port) pair indicating to send the request to the proxy
    - to register a connection adapter to the 'ftp://' prefix. This is basically a HTTP adapter but it uses the FULL url of
    the resource to build the request line, instead of only its relative path.
    """
    url = 'ftp://{}:{}@{}/{}'.format(user, password, host, remote_filepath)
    proxy_host, proxy_port = http_proxy.split(':')

    def parse_url_mock(url):
        return requests.packages.urllib3.util.url.parse_url(url)._replace(host=proxy_host, port=proxy_port, scheme='http')

    with open(output_filepath, 'w+b') as output_file, patch('requests.packages.urllib3.poolmanager.parse_url', new=parse_url_mock):
        session = requests.session()
        session.mount('ftp://', FTPWrappedInFTPAdapter())
        response = session.get(url)
        response.raise_for_status()
        output_file.write(response.content)
        return response.headers['last-modified']


class FTPWrappedInFTPAdapter(requests.adapters.HTTPAdapter):
    def request_url(self, request, _):
        return request.url