How can I perform multiple unrelated operations on elements of a single stream?
Say I have a List<String>
composed from a text. Each string in the list may or may not contain a certain word, which represents an action to perform. Let's say that:
- if the string contains 'of', all the words in that string must be counted
- if the string contains 'for', the portion after the first occurrence of 'for' must be returned, yielding a
List<String>
with all substrings
Of course, I could do something like this:
List<String> strs = ...;
List<Integer> wordsInStr = strs.stream()
.filter(t -> t.contains("of"))
.map(t -> t.split(" ").length)
.collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream()
.filter(t -> t.contains("for"))
.map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
but then the list would be traversed twice, which could result in a performance penalty if strs
contained lots of elements.
Is it possible to somehow execute those two operations without traversing twice over the list?
If you want a single pass Stream
then you have to use a custom Collector
(parallelization possible).
class Splitter {
public List<String> words = new ArrayList<>();
public List<Integer> counts = new ArrayList<>();
public void accept(String s) {
if(s.contains("of")) {
counts.add(s.split(" ").length);
} else if(s.contains("for")) {
words.add(s.substring(s.indexOf("for")));
}
}
public Splitter merge(Splitter other) {
words.addAll(other.words);
counts.addAll(other.counts);
return this;
}
}
Splitter collect = strs.stream().collect(
Collector.of(Splitter::new, Splitter::accept, Splitter::merge)
);
System.out.println(collect.counts);
System.out.println(collect.words);
Here is the answer to address the OP from a different aspect. First of all, let's take a look how fast/slow to iterate a list/collection. Here is the test result on my machine by the below performance test:
When: length of string list = 100, Thread number = 1, loops = 1000, unit = milliseconds
OP: 0.013
Accepted answer: 0.020
By the counter function: 0.010
When: length of string list = 1000_000, Thread number = 1, loops = 100, unit = milliseconds
OP: 99.387
Accepted answer: 89.848
By the counter function: 59.183
Conclusion: The percentage of performance improvement is pretty small or even slower(if the length of string list is small). generally, it's a mistake to reduce the iteration of list/collection which is loaded in memory by the more complicate collector. you won't get much performance improvements. we should look into somewhere else if there is a performance issue.
Here is my performance test code with tool Profiler: (I'm not going to discuss how to do a performance test here. if you doubt the test result, you can do it again with any tool you believe in)
@Test
public void test_46539786() {
final int strsLength = 1000_000;
final int threadNum = 1;
final int loops = 100;
final int rounds = 3;
final List<String> strs = IntStream.range(0, strsLength).mapToObj(i -> i % 2 == 0 ? i + " of " + i : i + " for " + i).toList();
Profiler.run(threadNum, loops, rounds, "OP", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(t -> t.split(" ").length).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
Profiler.run(threadNum, loops, rounds, "Accepted answer", () -> {
Splitter collect = strs.stream().collect(Collector.of(Splitter::new, Splitter::accept, Splitter::merge));
assertTrue(collect.counts.size() == collect.words.size());
}).printResult();
final Function<String, Integer> counter = s -> {
int count = 0;
for (int i = 0, len = s.length(); i < len; i++) {
if (s.charAt(i) == ' ') {
count++;
}
}
return count;
};
Profiler.run(threadNum, loops, rounds, "By the counter function", () -> {
List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(counter).collect(Collectors.toList());
List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
.collect(Collectors.toList());
assertTrue(wordsInStr.size() == linePortionAfterFor.size());
}).printResult();
}
You could use a custom collector for that and iterate only once:
private static <T, R> Collector<String, ?, Pair<List<String>, List<Long>>> multiple() {
class Acc {
List<String> strings = new ArrayList<>();
List<Long> longs = new ArrayList<>();
void add(String elem) {
if (elem.contains("of")) {
long howMany = Arrays.stream(elem.split(" ")).count();
longs.add(howMany);
}
if (elem.contains("for")) {
String result = elem.substring(elem.indexOf("for"));
strings.add(result);
}
}
Acc merge(Acc right) {
longs.addAll(right.longs);
strings.addAll(right.strings);
return this;
}
public Pair<List<String>, List<Long>> finisher() {
return Pair.of(strings, longs);
}
}
return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}
Usage would be:
Pair<List<String>, List<Long>> pair = Stream.of("t of r m", "t of r m", "nice for nice nice again")
.collect(multiple());
If you want to have 1 stream through a list, you need a way to manage 2 different states, you can do this by implementing Consumer to new class.
class WordsInStr implements Consumer<String> {
ArrayList<Integer> list = new ArrayList<>();
@Override
public void accept(String s) {
Stream.of(s).filter(t -> t.contains("of")) //probably would be faster without stream here
.map(t -> t.split(" ").length)
.forEach(list::add);
}
}
class LinePortionAfterFor implements Consumer<String> {
ArrayList<String> list = new ArrayList<>();
@Override
public void accept(String s) {
Stream.of(s) //probably would be faster without stream here
.filter(t -> t.contains("for"))
.map(t -> t.substring(t.indexOf("for")))
.forEach(list::add);
}
}
WordsInStr w = new WordsInStr();
LinePortionAfterFor l = new LinePortionAfterFor();
strs.stream()//stream not needed here
.forEach(w.andThen(l));
System.out.println(w.list);
System.out.println(l.list);