Parse custom URIs with urlparse (Python)

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.

If I do not adjust urlparse's uses_* lists I get this:

>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')

Here is what I do, and I wonder if there is a better way to do it:

import urlparse

SCHEME = "qqqq"

# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)

Why is there no better way to do this?

标签： python url python-2.6 urlparse

6条回答

对你真心纯属浪费

2楼-- · 2019-02-03 08:40

You can also register a custom handler with urlparse:

import urlparse

def register_scheme(scheme):
    for method in filter(lambda s: s.startswith('uses_'), dir(urlparse)):
        getattr(urlparse, method).append(scheme)

register_scheme('moose')

This will append your url scheme to the lists:

uses_fragment
uses_netloc
uses_params
uses_query
uses_relative

The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.

urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict()
=> {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'}

0人赞添加讨论(0) 举报

该账号已被封号

3楼-- · 2019-02-03 08:42

The question appears to be out of date. Since at least Python 2.7 there are no issues.

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32
>>> import urlparse
>>> urlparse.urlparse("qqqq://base/id#hint")
ParseResult(scheme='qqqq', netloc='base', path='/id', params='', query='', fragment='hint')

0人赞添加讨论(0) 举报

一纸荒年 Trace。

4楼-- · 2019-02-03 08:46

I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

I would use the results of the first parse, then synthesize an http url and parse it again:

parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)

0人赞添加讨论(0) 举报

混吃等死

5楼-- · 2019-02-03 08:46

You can use yurl library. Unlike purl or furl, it not try to fix urlparse bugs. It is new compatible with RFC 3986 implementation.

>>> import yurl
>>> yurl.URL('qqqq://base/id#hint')
URLBase(scheme='qqqq', userinfo=u'', host='base', port='', path='/id', query='', fragment='hint')

0人赞添加讨论(0) 举报

SAY GOODBYE

6楼-- · 2019-02-03 08:49

Try removing the scheme entirely, and start with //netloc, i.e.:

>>> SCHEME="qqqq"
>>> url="qqqq://base/id#hint"[len(SCHEME)+1:]
>>> url
'//base/id#hint'
>>> urlparse.urlparse(url)
('', 'base', '/id', '', '', 'hint')

You won't have the scheme in the urlparse result, but you know the scheme anyway.

Also note that Python 2.6 seems to handle this url just fine (aside from the fragment):

$ python2.6 -c 'import urlparse; print urlparse.urlparse("qqqq://base/id#hint")'
ParseResult(scheme='qqqq', netloc='base', path='/id#hint', params='', query='', fragment='')

0人赞添加讨论(0) 举报

Anthone

7楼-- · 2019-02-03 08:59

There is also library called furl which gives you result you want:

>>>import furl
>>>f=furl.furl("qqqq://base/id#hint");
>>>f.scheme
'qqqq' 

>>> f.host
'base'  
>>> f.path
Path('/id')
>>>  f.path.segments
['id']
>>> f.fragment                                                                                                                                                                                                                                                                 
Fragment('hint')   
>>> f.fragmentstr                                                                                                                                                                                                                                                              
'hint'

0人赞添加讨论(0) 举报

Parse custom URIs with urlparse (Python)

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间