<span>
I Like
<span class='unwanted'> to punch </span>
your face
</span>
How to print "I Like your face" instead of "I Like to punch your face"
I tried this
lala = soup.find_all('span')
for p in lala:
if not p.find(class_='unwanted'):
print p.text
but it give
"TypeError: find() takes no keyword arguments"
You can use extract()
to remove unwanted tag before you get text.
But it keeps all '\n'
and spaces
so you will need some work to remove them.
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
soup = BS(data, 'html.parser')
external_span = soup.find('span')
print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())
unwanted = external_span.find('span')
unwanted.extract()
print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())
Result
1 HTML: <span>
I Like
<span class="unwanted"> to punch </span>
your face
<span></span></span>
1 TEXT: I Like
to punch
your face
2 HTML: <span>
I Like
your face
<span></span></span>
2 TEXT: I Like
your face
You can skip every Tag
object inside external span and keep only NavigableString
objects (it is plain text in HTML).
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
import bs4
soup = BS(data, 'html.parser')
external_span = soup.find('span')
text = []
for x in external_span:
if isinstance(x, bs4.element.NavigableString):
text.append(x.strip())
print(" ".join(text))
Result
I Like your face
You can easily find the (un)desired text like this:
from bs4 import BeautifulSoup
text = """<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>"""
soup = BeautifulSoup(text, "lxml")
for i in soup.find_all("span"):
if 'class' in i.attrs:
if "unwanted" in i.attrs['class']:
print(i.text)
From here outputting everything else can be easily done