可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to use python to change the hostname in a url, and have been playing around with the urlparse module for a while now without finding a satisfactory solution. As an example, consider the url:
https://www.google.dk:80/barbaz
I would like to replace "www.google.dk" with e.g. "www.foo.dk", so I get the following url:
https://www.foo.dk:80/barbaz.
So the part I want to replace is what urlparse.urlsplit refers to as hostname. I had hoped that the result of urlsplit would let me make changes, but the resulting type ParseResult doesn't allow me to. If nothing else I can of course reconstruct the new url by appending all the parts together with +, but this would leave me with some quite ugly code with a lot of conditionals to get "://" and ":" in the correct places.
回答1:
You can use urlparse.urlparse
function and ParseResult._replace
method:
>>> import urlparse
>>> parsed = urlparse.urlparse("https://www.google.dk:80/barbaz")
>>> replaced = parsed._replace(netloc="www.foo.dk:80")
>>> print replaced
ParseResult(scheme='https', netloc='www.foo.dk:80', path='/barbaz', params='', query='', fragment='')
ParseResult
is a subclass of namedtuple
and _replace
is a namedtuple
method that:
returns a new instance of the named tuple replacing specified fields
with new values
UPDATE:
As @2rs2ts said in the comment netloc
attribute includes a port number.
Good news: ParseResult
has hostname
and port
attributes.
Bad news: hostname
and port
are not the members of namedtuple
, they're dynamic properties and you can't do parsed._replace(hostname="www.foo.dk")
. It'll throw an exception.
If you don't want to split on :
and your url always has a port number and doesn't have username
and password
(that's urls like "https://username:password@www.google.dk:80/barbaz") you can do:
parsed._replace(netloc="{}:{}".format(parsed.hostname, parsed.port))
回答2:
You can take advantage of urlsplit
and urlunsplit
from Python's urlparse
:
>>> from urlparse import urlsplit, urlunsplit
>>> url = list(urlsplit('https://www.google.dk:80/barbaz'))
>>> url
['https', 'www.google.dk:80', '/barbaz', '', '']
>>> url[1] = 'www.foo.dk:80'
>>> new_url = urlunsplit(url)
>>> new_url
'https://www.foo.dk:80/barbaz'
As the docs state, the argument passed to urlunsplit()
"can be any five-item iterable", so the above code works as expected.
回答3:
Using urlparse
and urlunparse
methods of urlparse
module:
import urlparse
old_url = 'https://www.google.dk:80/barbaz'
url_lst = list(urlparse.urlparse(old_url))
# Now url_lst is ['https', 'www.google.dk:80', '/barbaz', '', '', '']
url_lst[1] = 'www.foo.dk:80'
# Now url_lst is ['https', 'www.foo.dk:80', '/barbaz', '', '', '']
new_url = urlparse.urlunparse(url_lst)
print(old_url)
print(new_url)
Output:
https://www.google.dk:80/barbaz
https://www.foo.dk:80/barbaz
回答4:
A simple string replace of the host in the netloc also works in most cases:
>>> p = urlparse.urlparse('https://www.google.dk:80/barbaz')
>>> p._replace(netloc=p.netloc.replace(p.hostname, 'www.foo.dk')).geturl()
'https://www.foo.dk:80/barbaz'
This will not work if, by some chance, the user name or password matches the hostname. You cannot limit str.replace to replace the last occurrence only, so instead we can use split and join:
>>> p = urlparse.urlparse('https://www.google.dk:www.google.dk@www.google.dk:80/barbaz')
>>> new_netloc = 'www.foo.dk'.join(p.netloc.rsplit(p.hostname, 1))
>>> p._replace(netloc=new_netloc).geturl()
'https://www.google.dk:www.google.dk@www.foo.dk:80/barbaz'
回答5:
I would recommend also using urlsplit
and urlunsplit
like @linkyndy's answer, but for Python3
it would be:
>>> from urllib.parse import urlsplit, urlunsplit
>>> url = list(urlsplit('https://www.google.dk:80/barbaz'))
>>> url
['https', 'www.google.dk:80', '/barbaz', '', '']
>>> url[1] = 'www.foo.dk:80'
>>> new_url = urlunsplit(url)
>>> new_url
'https://www.foo.dk:80/barbaz'
回答6:
You can always do this trick:
>>> p = parse.urlparse("https://stackoverflow.com/questions/21628852/changing-hostname-in-a-url")
>>> parse.ParseResult(**dict(p._asdict(), netloc='perrito.com.ar')).geturl()
'https://perrito.com.ar/questions/21628852/changing-hostname-in-a-url'
回答7:
To just replace the host without touching the port in use (if any), use this:
import re, urlparse
p = list(urlparse.urlsplit('https://www.google.dk:80/barbaz'))
p[1] = re.sub('^[^:]*', 'www.foo.dk', p[1])
print urlparse.urlunsplit(p)
prints
https://www.foo.dk:80/barbaz
If you've not given any port, this works fine as well.
If you prefer the _replace
way Nigel pointed out, you can use this instead:
p = urlparse.urlsplit('https://www.google.dk:80/barbaz')
p = p._replace(netloc=re.sub('^[^:]*', 'www.foo.dk', p.netloc))
print urlparse.urlunsplit(p)