How can I split a url string up into separate part

I decided that I'll learn python tonight :) I know C pretty well (wrote an OS in it) so I'm not a noob in programming so everything in python seems pretty easy, but I don't know how to solve this problem : let's say I have this address:

http://example.com/random/folder/path.html Now how can I create two strings from this, one containing the "base" name of the server, so in this example it would be http://example.com/ and another containing the thing without the last filename, so in this example it would be http://example.com/random/folder/ . Also I of course know the possibility to just find the 3rd and last slash respectively but maybe you know a better way :] Also it would be cool to have the trailing slash in both cases but I don't care since it can be added easily. So anyone has a good, fast, effective solution for this? Or is there only "my" solution, finding the slashes?

Thanks!

标签： python url parsing

6条回答

地球回转人心会变

2楼-- · 2019-01-21 21:14

Thank you very much to the other answerers here, who pointed me in the right direction via the answers they have given!

It seems like the posixpath module mentioned by sykora's answer is not available in my Python setup (python 2.7.3).

As per this article it seems that the "proper" way to do this would be using...

urlparse.urlparse and urlparse.urlunparse can be used to detach and reattach the base of the URL
The functions of os.path can be used to manipulate the path
urllib.url2pathname and urllib.pathname2url (to make path name manipulation portable, so it can work on Windows and the like)

So for example (not including reattaching the base URL)...

>>> import urlparse, urllib, os.path
>>> os.path.dirname(urllib.url2pathname(urlparse.urlparse("http://example.com/random/folder/path.html").path))
'/random/folder'

0人赞添加讨论(0) 举报

祖国的老花朵

3楼-- · 2019-01-21 21:20

You can use python's library furl:

f = furl.furl("http://example.com/random/folder/path.html")
print(str(f.path))  # '/random/folder/path.html'
print(str(f.path).split("/")) # ['', 'random', 'folder', 'path.html']

To access word after first "/", use:

str(f.path)`enter code here`.split("/") # random

0人赞添加讨论(0) 举报

成全新的幸福

4楼-- · 2019-01-21 21:23

In Python a lot of operations are done using lists. The urlparse module mentioned by Sebasian Dietz may well solve your specific problem, but if you're generally interested in Pythonic ways to find slashes in strings, for example, try something like this:

url = 'http://example.com/random/folder/path.html'
# Create a list of each bit between slashes
slashparts = url.split('/')
# Now join back the first three sections 'http:', '' and 'example.com'
basename = '/'.join(slashparts[:3]) + '/'
# All except the last one
dirname = '/'.join(slashparts[:-1]) + '/'
print 'slashparts = %s' % slashparts
print 'basename = %s' % basename
print 'dirname = %s' % dirname

The output of this program is this:

slashparts = ['http:', '', 'example.com', 'random', 'folder', 'path.html']
basename = http://example.com/
dirname = http://example.com/random/folder/

The interesting bits are split, join, the slice notation array[A:B] (including negatives for offsets-from-the-end) and, as a bonus, the % operator on strings to give printf-style formatting.

0人赞添加讨论(0) 举报

劫难

5楼-- · 2019-01-21 21:25

The urlparse module in python 2.x (or urllib.parse in python 3.x) would be the way to do it.

>>> from urllib.parse import urlparse
>>> url = 'http://example.com/random/folder/path.html'
>>> parse_object = urlparse(url)
>>> parse_object.netloc
'example.com'
>>> parse_object.path
'/random/folder/path.html'
>>> parse_object.scheme
'http'
>>>

If you wanted to do more work on the path of the file under the url, you can use the posixpath module :

>>> from posixpath import basename, dirname
>>> basename(parse_object.path)
'path.html'
>>> dirname(parse_object.path)
'/random/folder'

After that, you can use posixpath.join to glue the parts together.

EDIT: I totally forgot that windows users will choke on the path separator in os.path. I read the posixpath module docs, and it has a special reference to URL manipulation, so all's good.

0人赞添加讨论(0) 举报

beautiful°

6楼-- · 2019-01-21 21:35

I have no experience with Python, but I found the urlparse module, which should do the job.

0人赞添加讨论(0) 举报

不美不萌又怎样

7楼-- · 2019-01-21 21:36

If this is the extent of your URL parsing, Python's inbuilt rpartition will do the job:

>>> URL = "http://example.com/random/folder/path.html"
>>> Segments = URL.rpartition('/')
>>> Segments[0]
'http://example.com/random/folder'
>>> Segments[2]
'path.html'

From Pydoc, str.rpartition:

Splits the string at the last occurrence of sep, and returns a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself

What this means is that rpartition does the searching for you, and splits the string at the last (right most) occurrence of the character you specify (in this case / ). It returns a tuple containing:

(everything to the left of char , the character itself , everything to the right of char)

0人赞添加讨论(0) 举报

How can I split a url string up into separate part

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间