I'm having trouble getting all the html code under the tags. Here is my current code:
Document document = Jsoup.connect("http://stackoverflow.com/questions/2971155/what-is-the-fastest-way-to-scrape-html-webpage-in-android").get();
Elements desc = document.select("tr");
System.out.println(desc.toString());
It's for that question, and I'm trying to get the text from the question's description. But I'm getting not getting certain tr or td tags like the ones for the question. Here is td tag I'm trying to get:
<td class="postcell">
Under that tag is the actual post. Now when I print out what I'm actually getting, I'm getting a ton of empty td tags and some comments, but not the actual post.
<tr id="comment-37956942" class="comment ">
<td>
<table>
<tbody>
<tr>
<td class=" comment-score"> </td>
<td> </td>
</tr>
</tbody>
</table> </td>
<td class="comment-text">
<div style="display: block;" class="comment-body">
<span class="comment-copy">You shouldn't parse HTML with regexes: <a href="http://blog.codinghorror.com/parsing-html-the-cthulhu-way/" rel="nofollow">blog.codinghorror.com/parsing-html-the-cthulhu-way</a></span> –
﹕ <a href="/users/25612/motob%c3%b3i" title="469 reputation" class="comment-user">motobói</a>
And it keeps on going with empty td and tr tags. I can't find the actual question. Anyone know why this is happening?
Essentially, I just want the text from the question's post, and I don't know how to get it, so it would be nice if someone could show me how to get the text.