using a regex in jsoup

2019-07-23 22:48发布

I'm trying my first serious project in jsoup and I've got stuck in this matter-

I'm trying to get zipcodes from a site. There is a list of zipcodes.

Here is one of the lines that presents the zipcode-

<td align="center"><a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a></td>   

So the idea I've got is going through the page and getting all the strings that contain 6 digits from 1-9. Regex is ^[0-9]{6,6}$

code was -

doc.select("td:matchesOwn(^[0-9]{5,5}$)");

but nothing came out. I can't find the way to get these zipcodes out of that site.... Does anyone know how to do it?

the real question here is how do i get the numbers that are not in any tags,but just written out in the open (i guess there is a term for that but im not that good with xml terms)

1条回答
来,给爷笑一个
2楼-- · 2019-07-23 23:18

I solved it using Element#getElementsMatchingOwnText:

public static void main(String[] args) {
    final String html = "<td align=\"center\"><a href=\"http://www.zipcodestogo.com/Hialeah/FL/33011/\">33011</a></td> ";
    final Elements elements = Jsoup.parse(html).getElementsMatchingOwnText("^[0-9]{5,5}$");

    for (final Element element : elements) {
        System.out.println("element = [" + element + "]");
        System.out.println("zip = [" + element.text() + "]");
    }
}

Output:

element = [<a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a>]
zip = [33011]
查看更多
登录 后发表回答