I'm interested in building a Huffman Coding prototype. To that end, I want to begin by producing a histogram of the characters that make up an input Java String
. I've seen many solutions on SO and elsewhere (e.g:here that depend on using the collect()
methods for Stream
s as well as static imports of Function.identity()
and Collectors.counting()
in a very specific and intuitive way.
However, when using a piece of code eerily similar to the one I linked to above:
private List<HuffmanTrieNode> getCharsAndFreqs(String s){
Map<Character, Long> freqs = s.chars().collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
return null;
}
I receive a compile-time-error from Intellij which essentially tells me that there is no arguments to collect
that conforms to a Supplier
type, as required by its signature:
Unfortunately, I'm new to the Java 8 Stream
hierarchy and I'm not entirely sure what the best course of action for me should be. In fact, going the Map
way might be too much boilerplate for what I'm trying to do; please advise if so.
The problem is that s.chars()
returns an IntStream
- a particular specialization of Stream
and it does not have a collect
that takes a single argument; it's collect
takes 3 arguments. Obviously you can use boxed
and that would transform that IntStream
to Stream<Integer>
.
Map<Integer, Long> map = yourString.codePoints()
.boxed()
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.counting()));
But now the problem is that you have counted code-points
and not chars. If you absolutely know that your String is made from characters in the BMP, you can safely cast to char
as shown in the other answer. If you are not - things get trickier.
In that case you need to get the single unicode code point as a character - but it might not fit into a Java char
- that has 2 bytes; and a unicode character can be up to 4 bytes.
In that case your map should be Map<String, Long>
and not Map<Character, Long>
.
In java-9 with the introduction of supported \X
(and Scanner#findAll
) this is fairly easy to do:
String sample = "A" + "\uD835\uDD0A" + "B" + "C";
Map<String, Long> map = scan.findAll("\\X")
.map(MatchResult::group)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
System.out.println(map); // {A=1, B=1, C=1,