Matching id's in BeautifulSoup

2019-01-14 22:12发布

问题:

I'm using BeautifulSoup - python module. I have to find any reference to the div's with id like: 'post-#'. For example:

<div id="post-45">...</div>
<div id="post-334">...</div>

How can I filter this?

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)
print soupHandler.findAll('div', id='post-*')
> []

回答1:

You can pass a function to findAll:

>>> print soupHandler.findAll('div', id=lambda x: x and x.startswith('post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]

Or a regular expression:

>>> print soupHandler.findAll('div', id=re.compile('^post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]


回答2:

Since he is asking to match "post-#somenumber#", it's better to precise with

import re
[...]
soupHandler.findAll('div', id=re.compile("^post-\d+"))


回答3:

soupHandler.findAll('div', id=re.compile("^post-$"))

looks right to me.



回答4:

This works for me:

from bs4 import BeautifulSoup
import re

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)

for match in soupHandler.find_all('div', id=re.compile("post-")):
    print match.get('id')

>>> 
post-45
post-334