Android : Parse HTML block of code

I have the following piece of HTML code which I need to parse to retrieve the player name and the runs he has scored. In this case it's 'Ross Taylor' and 9. What's the best way to do parse this info? Don't want to use an HTML parser. Is REGEX the best way (I know people are dead against this! But I just want these 2 bits of info and hence don't want to use a parser)? I've been racking my brains on how I should figure out where the player name is in the html file and the consequent row which has the runs scored. The HTML comment part below is a hard coded one. I can reach this place. Then retrieve the name between the tags. Is this a good way to do it? Also how do I retrieve the runs part in the immediate next row?

<!-- <a href="javascript:void(0);" onClick="return showHwkTooltip(this, 'lvpyrbat1');" class="livePlayerCurrent">*Luke Woodcock</a>-->

<a href="/icc_cricket_worldcup2011/content/current/player/38920.html" target="_blank" class="livePlayerCurrent" title="view the player profile for Ross Taylor">
*Ross Taylor
</a>    <span style="margin-left:5px;" title="left-hand bat">(lhb)</span >

   </td >
   <td><b>9</b></td>
   <td>9</td>
   <td>1</td>
   <td>0</td>
   <td>100.00</td>
   <td></td>
   <td colspan="3" align="left"><span class="batStyl">striker</style></td>
   <td></td>
   <td colspan="8"></td>
  </tr>

Please let me know if you need more info.

Regards, Sam

标签： java android html regex parsing

3条回答

Summer. ? 凉城

2楼-- · 2019-05-22 21:43

What's the best way to do parse this info?

Use an HTML parser.

Don't want to use an HTML parser.

I disagree.

Is REGEX the best way

No.

0人赞添加讨论(0) 举报

Juvenile、少年°

3楼-- · 2019-05-22 21:46

For what it is worth, you can also have a look at Jsoup. I used it in my projects,and it handles malformed html very well. I believe that might be the only reason I'm using it ;)

Regards, EZFrag

0人赞添加讨论(0) 举报

甜甜的少女心

4楼-- · 2019-05-22 22:04

Please consider using the proper tool for the job, e.g., a html/xml parser not regex.

If you really want to do it using regex you can try the following out:

Extract score

  (?<=\\<b\\>)\\d+(?=\\</b\\>)

Extract player name

  (?<=\\>)[^\\<]+(?=\\</a\\>)

The second regex assumed you sanitized the xml by removing the anchortag between comment tags.

 <!-- ... -->

What it does it extract the value within any anchortag. This is one of the fundamental restrictions when using regex, it isn't context-aware.

0人赞添加讨论(0) 举报

Android : Parse HTML block of code

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间