I have a HTML with lots of data and part I am interested in:
<tr valign=top>
<td><b>Total</b></td>
<td align=right><b>54<b></td>
<td align=right><b>1<b></td>
<td align=right>0 (0/0)</td>
<td align=right><b>0<b></td>
</tr>
I try to use awk which now is:
awk -F "</*b>|</td>" '/<[b]>.*[0-9]/ {print $1, $2, $3 }' "index.html"
but what I want is to have:
54
1
0
0
Right now I am getting:
'<td align=right> 54'
'<td align=right> 1'
'<td align=right> 0'
Any suggestions?
Output:
Another:
You really should to use some real HTML parser for this job, like:
prints:
But for this you need to have perl, and installed Mojolicious package.
(it is easy to install with:)
awk
is not an HTML parser. Usexpath
or evenxslt
for that.xmllint
is a commandline tool which is able to execute XPath queries andxsltproc
can be used to perform XSL transformations. Both tools belong to the packagelibxml2-utils
.Also you can use a programming language which is able to parse HTML