Recently I change version of the JDK 8 instead 7 of my project and now I overwrite some code snippets using new features that came with Java 8.
final Matcher mtr = Pattern.compile(regex).matcher(input);
HashSet<String> set = new HashSet<String>() {{
while (mtr.find()) add(mtr.group().toLowerCase());
}};
How I can write this code using Stream API ?
A Matcher
-based spliterator implementation can be quite simple if you reuse the JDK-provided Spliterators.AbstractSpliterator
:
public class MatcherSpliterator extends AbstractSpliterator<String[]>
{
private final Matcher m;
public MatcherSpliterator(Matcher m) {
super(Long.MAX_VALUE, ORDERED | NONNULL | IMMUTABLE);
this.m = m;
}
@Override public boolean tryAdvance(Consumer<? super String[]> action) {
if (!m.find()) return false;
final String[] groups = new String[m.groupCount()+1];
for (int i = 0; i <= m.groupCount(); i++) groups[i] = m.group(i);
action.accept(groups);
return true;
}
}
Note that the spliterator provides all matcher groups, not just the full match. Also note that this spliterator supports parallelism because AbstractSpliterator
implements a splitting policy.
Typically you will use a convenience stream factory:
public static Stream<String[]> matcherStream(Matcher m) {
return StreamSupport.stream(new MatcherSpliterator(m), false);
}
This gives you a powerful basis to concisely write all kinds of complex regex-oriented logic, for example:
private static final Pattern emailRegex = Pattern.compile("([^,]+?)@([^,]+)");
public static void main(String[] args) {
final String emails = "kid@gmail.com, stray@yahoo.com, miks@tijuana.com";
System.out.println("User has e-mail accounts on these domains: " +
matcherStream(emailRegex.matcher(emails))
.map(gs->gs[2])
.collect(joining(", ")));
}
Which prints
User has e-mail accounts on these domains: gmail.com, yahoo.com, tijuana.com
For completeness, your code will be rewritten as
Set<String> set = matcherStream(mtr).map(gs->gs[0].toLowerCase()).collect(toSet());
Marko's answer demonstrates how to get matches into a stream using a Spliterator
. Well done, give that man a big +1! Seriously, make sure you upvote his answer before you even consider upvoting this one, since this one is entirely derivative of his.
I have only a small bit to add to Marko's answer, which is that instead of representing the matches as an array of strings (with each array element representing a match group), the matches are better represented as a MatchResult
which is a type invented for this purpose. Thus the result would be a Stream<MatchResult>
instead of Stream<String[]>
. The code gets a little simpler, too. The tryAdvance
code would be
if (m.find()) {
action.accept(m.toMatchResult());
return true;
} else {
return false;
}
The map
call in his email-matching example would change to
.map(mr -> mr.group(2))
and the OP's example would be rewritten as
Set<String> set = matcherStream(mtr)
.map(mr -> mr.group(0).toLowerCase())
.collect(toSet());
Using MatchResult
gives a bit more flexibility in that it also provides offsets of match groups within the string, which could be useful for certain applications.
I don't think you can turn this into a Stream
without writing your own Spliterator, but, I don't know why you would want to.
Matcher.find()
is a state changing operation on the Matcher
object so running each find() in a parallel stream would produce inconsistent results. Running the stream in serial wouldn't have better performance that the Java 7 equivalent and would be harder to understand.
What about Pattern.splitAsStream
?
Stream<String> stream = Pattern.compile(regex).splitAsStream(input);
and then a collector to get a set.
Set<String> set = stream.map(String::toLowerCase).collect(Collectors.toSet());
What about
public class MakeItSimple {
public static void main(String[] args) throws FileNotFoundException {
Scanner s = new Scanner(new File("C:\\Users\\Admin\\Desktop\\TextFiles\\Emails.txt"));
HashSet<String> set = new HashSet<>();
while ( s.hasNext()) {
String r = s.next();
if (r.matches("([^,]+?)@([^,]+)")) {
set.add(r);
}
}
set.stream().map( x -> x.toUpperCase()).forEach(x -> print(x));
s.close();
}
}
Here is the implementation using Spliterator interface.
// To get the required set
Set<String> result = (StreamSupport.stream(new MatcherGroupIterator(pattern,input ),false))
.map( s -> s.toLowerCase() )
.collect(Collectors.toSet());
...
private static class MatcherGroupIterator implements Spliterator<String> {
private final Matcher matcher;
public MatcherGroupIterator(Pattern p, String s) {
matcher = p.matcher(s);
}
@Override
public boolean tryAdvance(Consumer<? super String> action) {
if (!matcher.find()){
return false;
}
action.accept(matcher.group());
return true;
}
@Override
public Spliterator<String> trySplit() {
return null;
}
@Override
public long estimateSize() {
return Long.MAX_VALUE;
}
@Override
public int characteristics() {
return Spliterator.NONNULL;
}
}