My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.
If I do not adjust urlparse's uses_* lists I get this:
>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')
Here is what I do, and I wonder if there is a better way to do it:
import urlparse
SCHEME = "qqqq"
# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)
Why is there no better way to do this?
You can also register a custom handler with urlparse:
This will append your url scheme to the lists:
The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.
The question appears to be out of date. Since at least Python 2.7 there are no issues.
I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.
I would use the results of the first parse, then synthesize an http url and parse it again:
You can use yurl library. Unlike purl or furl, it not try to fix urlparse bugs. It is new compatible with RFC 3986 implementation.
Try removing the scheme entirely, and start with //netloc, i.e.:
You won't have the scheme in the urlparse result, but you know the scheme anyway.
Also note that Python 2.6 seems to handle this url just fine (aside from the fragment):
There is also library called furl which gives you result you want: