Java preg_match array

2020-07-18 11:43发布

Have string strng = "<title>text1</title><title>text2</title>"; How to get array like

arr[0] = "text1";
arr[1] = "text2";

I try to use this, but in result have, and not array text1</title><title>text2

Pattern pattern = Pattern.compile("<title>(.*)</title>");
Matcher matcher = pattern.matcher(strng);
matcher.matches();

标签: java html regex
3条回答
We Are One
2楼-- · 2020-07-18 11:55

That looks like invalid XML since there is no container element, if you make that valid XML you can parse it using an XML parser. For small snippets like the above, I would recommend JDOM.

If it is XML or HTML don't use try and use regular expressions, because XML and HTML are not regular languages, and you can't successfully parse either with regular expressions because they can not maintain enough state. Just search stackoverflow for more detailed information why. this comes up constantly and there is lots of information on why not to do it and why it wont' work.

查看更多
【Aperson】
3楼-- · 2020-07-18 11:59

It looks like you would want a HTML / XML parser which are built for these kind of jobs.

Although, if you have a small set of controlled information (like the one line above) you might employ an iteration over matcher.find() using a regex such as

(?<=\\>)\\w+(?=\\<)

Again, anything more complicated than your one liner should be parsed by a proper parser since regex cannot parse HTML/XML.

查看更多
贪生不怕死
4楼-- · 2020-07-18 12:19

While I agree that using an XML / HTML parser is a better alternative in general, your scenario is simple to solve with regex:

List<String> titles = new ArrayList<String>();
Matcher matcher = Pattern.compile("<title>(.*?)</title>").matcher(strng);
while(matcher.find()){
    titles.add(matcher.group(1));
}

Note the non-greedy operator .*? and use of matcher.find() instead of matcher.matches().

Reference:

查看更多
登录 后发表回答