I thought about the following while writing an answer to this question.
Suppose I have a deeply nested xml
file like this (but much more nested and much longer):
<section name="1">
<subsection name"foo">
<subsubsection name="bar">
<deeper name="hey">
<much_deeper name"yo">
<li>Some content</li>
</much_deeper>
</deeper>
</subsubsection>
</subsection>
</section>
<section name="2">
... and so forth
</section>
The problem with len(soup.find_all("section"))
is that while doing find_all("section")
, BS keeps searching deep into a tag that I know won't contain any other section
tag.
So, two questions:
- Is there a way to make BS not search recursively into an already found tag?
- If the answer to 1 is yes, will it be more efficient or is it the same internal process?