Perform multiple unrelated operations on elements

2019-04-07 22:18发布

问题:

How can I perform multiple unrelated operations on elements of a single stream?

Say I have a List<String> composed from a text. Each string in the list may or may not contain a certain word, which represents an action to perform. Let's say that:

  • if the string contains 'of', all the words in that string must be counted
  • if the string contains 'for', the portion after the first occurrence of 'for' must be returned, yielding a List<String> with all substrings

Of course, I could do something like this:

List<String> strs = ...;

List<Integer> wordsInStr = strs.stream()
    .filter(t -> t.contains("of"))
    .map(t -> t.split(" ").length)
    .collect(Collectors.toList());

List<String> linePortionAfterFor = strs.stream()
    .filter(t -> t.contains("for"))
    .map(t -> t.substring(t.indexOf("for")))
    .collect(Collectors.toList());

but then the list would be traversed twice, which could result in a performance penalty if strs contained lots of elements.

Is it possible to somehow execute those two operations without traversing twice over the list?

回答1:

If you want a single pass Stream then you have to use a custom Collector (parallelization possible).

class Splitter {
  public List<String> words = new ArrayList<>();
  public List<Integer> counts = new ArrayList<>();

  public void accept(String s) {
    if(s.contains("of")) {
      counts.add(s.split(" ").length);
    } else if(s.contains("for")) {
      words.add(s.substring(s.indexOf("for")));
    }
  }

  public Splitter merge(Splitter other) {
    words.addAll(other.words);
    counts.addAll(other.counts);
    return this;
  }
}
Splitter collect = strs.stream().collect(
  Collector.of(Splitter::new, Splitter::accept, Splitter::merge)
);
System.out.println(collect.counts);
System.out.println(collect.words);


回答2:

Here is the answer to address the OP from a different aspect. First of all, let's take a look how fast/slow to iterate a list/collection. Here is the test result on my machine by the below performance test:

When: length of string list = 100, Thread number = 1, loops = 1000, unit = milliseconds


OP: 0.013

Accepted answer: 0.020

By the counter function: 0.010


When: length of string list = 1000_000, Thread number = 1, loops = 100, unit = milliseconds


OP: 99.387

Accepted answer: 89.848

By the counter function: 59.183


Conclusion: The percentage of performance improvement is pretty small or even slower(if the length of string list is small). generally, it's a mistake to reduce the iteration of list/collection which is loaded in memory by the more complicate collector. you won't get much performance improvements. we should look into somewhere else if there is a performance issue.

Here is my performance test code with tool Profiler: (I'm not going to discuss how to do a performance test here. if you doubt the test result, you can do it again with any tool you believe in)

@Test
public void test_46539786() {
    final int strsLength = 1000_000;
    final int threadNum = 1;
    final int loops = 100;
    final int rounds = 3;

    final List<String> strs = IntStream.range(0, strsLength).mapToObj(i -> i % 2 == 0 ? i + " of " + i : i + " for " + i).toList();

    Profiler.run(threadNum, loops, rounds, "OP", () -> {
        List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(t -> t.split(" ").length).collect(Collectors.toList());
        List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
                .collect(Collectors.toList());

        assertTrue(wordsInStr.size() == linePortionAfterFor.size());
    }).printResult();

    Profiler.run(threadNum, loops, rounds, "Accepted answer", () -> {
        Splitter collect = strs.stream().collect(Collector.of(Splitter::new, Splitter::accept, Splitter::merge));
        assertTrue(collect.counts.size() == collect.words.size());
    }).printResult();

    final Function<String, Integer> counter = s -> {
        int count = 0;
        for (int i = 0, len = s.length(); i < len; i++) {
            if (s.charAt(i) == ' ') {
                count++;
            }
        }
        return count;
    };

    Profiler.run(threadNum, loops, rounds, "By the counter function", () -> {
        List<Integer> wordsInStr = strs.stream().filter(t -> t.contains("of")).map(counter).collect(Collectors.toList());
        List<String> linePortionAfterFor = strs.stream().filter(t -> t.contains("for")).map(t -> t.substring(t.indexOf("for")))
                .collect(Collectors.toList());

        assertTrue(wordsInStr.size() == linePortionAfterFor.size());
    }).printResult();
}


回答3:

You could use a custom collector for that and iterate only once:

 private static <T, R> Collector<String, ?, Pair<List<String>, List<Long>>> multiple() {

    class Acc {

        List<String> strings = new ArrayList<>();

        List<Long> longs = new ArrayList<>();

        void add(String elem) {
            if (elem.contains("of")) {
                long howMany = Arrays.stream(elem.split(" ")).count();
                longs.add(howMany);
            }
            if (elem.contains("for")) {
                String result = elem.substring(elem.indexOf("for"));
                strings.add(result);
            }

        }

        Acc merge(Acc right) {
            longs.addAll(right.longs);
            strings.addAll(right.strings);
            return this;
        }

        public Pair<List<String>, List<Long>> finisher() {
            return Pair.of(strings, longs);
        }

    }
    return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}

Usage would be:

Pair<List<String>, List<Long>> pair = Stream.of("t of r m", "t of r m", "nice for nice nice again")
            .collect(multiple());


回答4:

If you want to have 1 stream through a list, you need a way to manage 2 different states, you can do this by implementing Consumer to new class.

    class WordsInStr implements Consumer<String> {

      ArrayList<Integer> list = new ArrayList<>();

      @Override
      public void accept(String s) {
        Stream.of(s).filter(t -> t.contains("of")) //probably would be faster without stream here
            .map(t -> t.split(" ").length)
            .forEach(list::add);
      }
    }

    class LinePortionAfterFor implements Consumer<String> {

      ArrayList<String> list = new ArrayList<>();

      @Override
      public void accept(String s) {
        Stream.of(s) //probably would be faster without stream here
            .filter(t -> t.contains("for"))
            .map(t -> t.substring(t.indexOf("for")))
            .forEach(list::add);
      }
    }

    WordsInStr w = new WordsInStr();
    LinePortionAfterFor l = new LinePortionAfterFor();

    strs.stream()//stream not needed here
        .forEach(w.andThen(l));
    System.out.println(w.list);
    System.out.println(l.list);