I'm using beautiful soup. There is a tag like this:
<li><a href="example"> s.r.o., <small>small</small></a></li>
I want to get the text within the anchor <a>
tag only, without any from the <small>
tag in the output; i.e. " s.r.o.,
"
I tried find('li').text[0]
but it does not work.
Is there a command in BS4 which can do that?
Use .children
One option would be to get the first element from the
contents
of thea
element:Another one would be to find the
small
tag and get the previous sibling:Well, there are all sorts of alternative/crazy options also:
If you would like to loop to print all content of anchor tags located in html string/web page (must utilise urlopen from urllib), this works:
Output:
a_tag
is a list containing all anchor tags; collecting all anchor tags in a list, enables group editing (if more than one<a>
tags present.