JSoup - How to parse nested texts?

2019-12-16 17:30发布

I'm parsing html of a website with JSoup. I want to parse this part:

<td class="lastpost">
This is a text 1<br>
<a href="post/13594">Website Page - 1</a>
</td>

I want like this:

String text = "This is a text 1";
String textNo = "Website Page - 1";
String link = "post/13594";

How can I get the parts like this?

标签： java parsing jsoup

1条回答

做个烂人

2楼-- · 2019-12-16 17:46

Your code would only get all the text that is in the td elements that you are selecting. If you want to store the text in separate variables, you should grab the parts separately like the following code. Extra comments added so you can understand how/why it is getting each piece.

// Get the first td element that has class="lastpost"
Element lastPost = document.select("td.lastpost").first();
// Get the first a element that is a child of the td
Element linkElement = lastPost.getElementsByTag("a").first();

// This text is the first child node of td, get that node and call toString
String text = lastPost.childNode(0).toString();
// This is the text within the a (link) element
String textNo = linkElement.text();
// This text is the href attribute value of the a (link) element
String link = linkElement.attr("href");

0人赞添加讨论(0) 举报

JSoup - How to parse nested texts?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间