Python BeautifulSoup equivalent to lxml make_links

2020-06-16 01:54发布

So lxml has a very hand feature: make_links_absolute:

doc = lxml.html.fromstring(some_html_page)
doc.make_links_absolute(url_for_some_html_page)

and all the links in doc are absolute now. Is there an easy equivalent in BeautifulSoup or do I simply need to pass it through urlparse and normalize it:

soup = BeautifulSoup(some_html_page)
for tag in soup.findAll('a', href=True):
    url_data = urlparse(tag['href'])
    if url_data[0] == "":
        full_url = url_for_some_html_page + test_url

标签： python beautifulsoup lxml

1条回答

Evening l夕情丶

2楼-- · 2020-06-16 02:25

In my answer to What is a simple way to extract the list of URLs on a webpage using python? I covered that incidentally as part of the extraction step; you could easily write a method to do it on the soup and not just extract it.

import urlparse

def make_links_absolute(soup, url):
    for tag in soup.findAll('a', href=True):
        tag['href'] = urlparse.urljoin(url, tag['href'])

0人赞添加讨论(0) 举报

Python BeautifulSoup equivalent to lxml make_links

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间