I am trying to Parse a html file using Python without using any external module. The reason is I am triggering a jenkins job and running into some import issues with lxml and BeautifulSoup (tried resolving it and I think somewhere I am doing over engineering to get my stuff done)
Input:
<tr class="test">
<td class="test">
<a href="a.html">BA</a>
</td>
<td class="duration">
0.000s
</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate">
N/A
</td>
</tr>
<tr class="test">
<td class="test">
<a href="o.html">Aa</a>
</td>
<td class="duration">
0.000s
</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate">
N/A
</td>
</tr>
<tr class="test">
<td class="test">
<a href="g.html">VideoAds</a>
</td>
<td class="duration">
0.390s
</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate">
N/A
</td>
</tr>
<tr class="suite">
<td colspan="2" class="totalLabel">Total</td>
<td class="zero number">271</td>
<td class="zero number">0</td>
<td class="zero number">3</td>
<td class="passRate suite">
98%
</td>
</tr>
Output:
I want to take that specific block of tr tag with the class "suite" (check at the end) and then pull the values for Zero number, Zero number, Zero number and passRate suite. Finally, print the values.
~~~~~~~~~~~~~~~~~~~~~~~~~~
Eg. Zero number = 271 ...
Pass rate = 98%
~~~~~~~~~~~~~~~~~~~~~~~~~~ Here is what I tried with lxml:
tree = parse(HTML_FILE)
tds = tree.xpath("//tr[@class='suite']//td/text()")
val = map(str.strip, tds)
This works out locally but I really want to do something without any external dependencies. Shall I use strip() or open a file using os.path.isFile(). I may not be correct but advise/walk me through what would be solution to do this.
Here is my solution as per the suggestions from @furas. Any improvements/suggestions are welcomed.
For one element you could try to use
re
module or even string functions.EDIT: to get othere values with
re