JSoup - How to parse nested texts?

2019-12-16 17:30发布

I'm parsing html of a website with JSoup. I want to parse this part:

<td class="lastpost">
This is a text 1<br>
<a href="post/13594">Website Page - 1</a>
</td>

I want like this:

String text = "This is a text 1";
String textNo = "Website Page - 1";
String link = "post/13594";

How can I get the parts like this?

1条回答
做个烂人
2楼-- · 2019-12-16 17:46

Your code would only get all the text that is in the td elements that you are selecting. If you want to store the text in separate variables, you should grab the parts separately like the following code. Extra comments added so you can understand how/why it is getting each piece.

// Get the first td element that has class="lastpost"
Element lastPost = document.select("td.lastpost").first();
// Get the first a element that is a child of the td
Element linkElement = lastPost.getElementsByTag("a").first();

// This text is the first child node of td, get that node and call toString
String text = lastPost.childNode(0).toString();
// This is the text within the a (link) element
String textNo = linkElement.text();
// This text is the href attribute value of the a (link) element
String link = linkElement.attr("href");
查看更多
登录 后发表回答