Slicing URL with Python

I am working with a huge list of URL's. Just a quick question I have trying to slice a part of the URL out, see below:

http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3

How could I slice out:

http://www.domainname.com/page?CONTENT_ITEM_ID=1234

Sometimes there is more than two parameters after the CONTENT_ITEM_ID and the ID is different each time, I am thinking it can be done by finding the first & and then slicing off the chars before that &, not quite sure how to do this tho.

Cheers

标签： python url string

10条回答

劫难

2楼-- · 2019-01-22 23:28

Use the urlparse module. Check this function:

import urlparse

def process_url(url, keep_params=('CONTENT_ITEM_ID=',)):
    parsed= urlparse.urlsplit(url)
    filtered_query= '&'.join(
        qry_item
        for qry_item in parsed.query.split('&')
        if qry_item.startswith(keep_params))
    return urlparse.urlunsplit(parsed[:3] + (filtered_query,) + parsed[4:])

In your example:

>>> process_url(a)
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'

This function has the added bonus that it's easier to use if you decide that you also want some more query parameters, or if the order of the parameters is not fixed, as in:

>>> url='http://www.domainname.com/page?other_value=xx&param3&CONTENT_ITEM_ID=1234&param1'
>>> process_url(url, ('CONTENT_ITEM_ID', 'other_value'))
'http://www.domainname.com/page?other_value=xx&CONTENT_ITEM_ID=1234'

0人赞添加讨论(0) 举报

甜甜的少女心

3楼-- · 2019-01-22 23:31

Look at the urllib2 file name question for some discussion of this topic.

Also see the "Python Find Question" question.

0人赞添加讨论(0) 举报

够拽才男人

4楼-- · 2019-01-22 23:32

Parsin URL is never as simple I it seems to be, that's why there are the urlparse and urllib modules.

E.G :

import urllib
url ="http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3"
query = urllib.splitquery(url)
result = "?".join((query[0], query[1].split("&")[0]))
print result
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'

This is still not 100 % reliable, but much more than splitting it yourself because there are a lot of valid url format that you and me don't know and discover one day in error logs.

0人赞添加讨论(0) 举报

欢心

5楼-- · 2019-01-22 23:39

The quick and dirty solution is this:

>>> "http://something.com/page?CONTENT_ITEM_ID=1234&param3".split("&")[0]
'http://something.com/page?CONTENT_ITEM_ID=1234'

0人赞添加讨论(0) 举报

淡お忘

6楼-- · 2019-01-22 23:40

import re
url = 'http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3'
m = re.search('(.*?)&', url)
print m.group(1)

0人赞添加讨论(0) 举报

够拽才男人

7楼-- · 2019-01-22 23:41

An ancient question, but still, I'd like to remark that query string paramenters can also be separated by ';' not only '&'.

0人赞添加讨论(0) 举报

1 2 下一页

Slicing URL with Python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间