I am working with an application that returns urls, written with Flask
. I want the URL
displayed to the user to be as clean as possible so I want to remove the http://
from it. I looked and found the urlparse
library, but couldn't find any examples of how to do this.
What would be the best way to go about it, and if urlparse
is overkill is there a simpler way? Would simply removing the "http://
" substring from the URL just using the regular string parsing tools be bad practice or cause problems?
I don't think urlparse
offers a single method or function for this. This is how I'd do it:
from urlparse import urlparse
url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'
def strip_scheme(url):
parsed = urlparse(url)
scheme = "%s://" % parsed.scheme
return parsed.geturl().replace(scheme, '', 1)
print strip_scheme(url)
Output:
stackoverflow.com/questions/tagged/python?page=2
If you'd use (only) simple string parsing, you'd have to deal with http[s]
, and possibly other schemes yourself. Also, this handles weird casing of the scheme.
If you are using these programmatically rather than using a replace, I suggest having urlparse recreate the url without a scheme.
The ParseResult object is a tuple. So you can create another removing the fields you don't want.
# py2/3 compatibility
try:
from urllib.parse import urlparse, ParseResult
except ImportError:
from urlparse import urlparse, ParseResult
def strip_scheme(url):
parsed_result = urlparse(url)
return ParseResult('', *parsed_result[1:]).geturl()
You can remove any component of the parsedresult by simply replacing the input with an empty string.
It's important to note there is a functional difference between this answer and @Lukas Graf's answer. The most likely functional difference is that the '//' component of a url isn't technically the scheme, so this answer will preserve it, whereas it will remain here.
>>> Lukas_strip_scheme('https://yoman/hi?whatup')
'yoman/hi?whatup'
>>> strip_scheme('https://yoman/hi?whatup')
'//yoman/hi?whatup'
I've seen this done in Flask libraries and extensions. Worth noting you can do it although it does make use of a protected member (._replace) of the ParseResult/SplitResult.
url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'
split_url = urlsplit(url)
# >>> SplitResult(scheme='http', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
split_url_without_scheme = split_url._replace(scheme="")
# >>> SplitResult(scheme='', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
new_url = urlunsplit(split_url_without_scheme)