How to remove scheme from url in Python?

2020-04-07 05:42发布

问题:

I am working with an application that returns urls, written with Flask. I want the URL displayed to the user to be as clean as possible so I want to remove the http:// from it. I looked and found the urlparse library, but couldn't find any examples of how to do this.

What would be the best way to go about it, and if urlparse is overkill is there a simpler way? Would simply removing the "http://" substring from the URL just using the regular string parsing tools be bad practice or cause problems?

回答1:

I don't think urlparse offers a single method or function for this. This is how I'd do it:

from urlparse import urlparse

url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'

def strip_scheme(url):
    parsed = urlparse(url)
    scheme = "%s://" % parsed.scheme
    return parsed.geturl().replace(scheme, '', 1)

print strip_scheme(url)

Output:

stackoverflow.com/questions/tagged/python?page=2

If you'd use (only) simple string parsing, you'd have to deal with http[s], and possibly other schemes yourself. Also, this handles weird casing of the scheme.



回答2:

If you are using these programmatically rather than using a replace, I suggest having urlparse recreate the url without a scheme.

The ParseResult object is a tuple. So you can create another removing the fields you don't want.

# py2/3 compatibility
try:
    from urllib.parse import urlparse, ParseResult
except ImportError:
    from urlparse import urlparse, ParseResult


def strip_scheme(url):
    parsed_result = urlparse(url)
    return ParseResult('', *parsed_result[1:]).geturl()

You can remove any component of the parsedresult by simply replacing the input with an empty string.

It's important to note there is a functional difference between this answer and @Lukas Graf's answer. The most likely functional difference is that the '//' component of a url isn't technically the scheme, so this answer will preserve it, whereas it will remain here.

>>> Lukas_strip_scheme('https://yoman/hi?whatup')
'yoman/hi?whatup'
>>> strip_scheme('https://yoman/hi?whatup')
'//yoman/hi?whatup'


回答3:

I've seen this done in Flask libraries and extensions. Worth noting you can do it although it does make use of a protected member (._replace) of the ParseResult/SplitResult.

url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'
split_url = urlsplit(url) 
# >>> SplitResult(scheme='http', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
split_url_without_scheme = split_url._replace(scheme="")
# >>> SplitResult(scheme='', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
new_url = urlunsplit(split_url_without_scheme)