I want to retrieve a strings from a global string via Matcher & Pattern using REGEX.
String str = "<strong>ABC</strong>123<strong>DEF</strong>"
Pattern pattern = Pattern.compile("<strong>(.*)</strong>");
Matcher matcher = pattern.matcher(str);
My problem is that the matcher gives me just one match that is inside the global tag strong:
ABC</strong>123<strong>DEF
My objective is to get 2 matches:
ABC
DEF
Thank you very match for you help.
You need a non greedy regex:
Pattern pattern = Pattern.compile("<strong>.*?</strong>");
Use ?
to specify non greedy. This means it will match the first match it finds instead of the outer most match...
If you only want ABC
and DEF
then you can do something like this using lookaheads and lookbehinds:
String str = "<strong>ABC</strong>123<strong>DEF</strong>";
Pattern pattern = Pattern.compile("((?<=<strong>).*?(?=</strong>))");
Matcher matcher = pattern.matcher(str);
while(matcher.find())
{
System.out.println(matcher.group());
}
If you do a google search you should be able to find information on lookaheads and lookbehinds...
I recommend to use JSOUP
to parse your HTML
code instead of regex as
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
// select your tag
Elements elements = doc.select("strong");
// get the iterator to traverse all elements
Iterator<Element> it = elements.iterator();
// loop through all elements and fetch their text
while (it.hasNext()) {
System.out.println(it.next().text());
}
Output :
ABC
DEF
or get Output as single string
Document doc = Jsoup.parse("<strong>ABC</strong>123<strong>DEF</strong>");
Elements elements = doc.select("strong");
System.out.println(elements.text());
Output:
ABC DEF
Download Jsoup and add it as a dependency