Producing histogram Map for IntStream raises compi

2019-04-14 02:37发布

问题:

I'm interested in building a Huffman Coding prototype. To that end, I want to begin by producing a histogram of the characters that make up an input Java String. I've seen many solutions on SO and elsewhere (e.g:here that depend on using the collect() methods for Streams as well as static imports of Function.identity() and Collectors.counting() in a very specific and intuitive way.

However, when using a piece of code eerily similar to the one I linked to above:

private List<HuffmanTrieNode> getCharsAndFreqs(String s){
        Map<Character, Long> freqs = s.chars().collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
        return null;
}

I receive a compile-time-error from Intellij which essentially tells me that there is no arguments to collect that conforms to a Supplier type, as required by its signature:

Unfortunately, I'm new to the Java 8 Stream hierarchy and I'm not entirely sure what the best course of action for me should be. In fact, going the Map way might be too much boilerplate for what I'm trying to do; please advise if so.

回答1:

The problem is that s.chars() returns an IntStream - a particular specialization of Stream and it does not have a collect that takes a single argument; it's collect takes 3 arguments. Obviously you can use boxed and that would transform that IntStream to Stream<Integer>.

Map<Integer, Long> map = yourString.codePoints()
          .boxed()
          .collect(Collectors.groupingBy(
                      Function.identity(), 
                      Collectors.counting()));

But now the problem is that you have counted code-points and not chars. If you absolutely know that your String is made from characters in the BMP, you can safely cast to char as shown in the other answer. If you are not - things get trickier.

In that case you need to get the single unicode code point as a character - but it might not fit into a Java char - that has 2 bytes; and a unicode character can be up to 4 bytes.

In that case your map should be Map<String, Long> and not Map<Character, Long>.

In java-9 with the introduction of supported \X (and Scanner#findAll) this is fairly easy to do:

 String sample = "A" + "\uD835\uDD0A" + "B" + "C";
         Map<String, Long> map = scan.findAll("\\X")
               .map(MatchResult::group)
               .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));


 System.out.println(map); // {A=1, B=1, C=1,