Let's say that I want to remove all the non-letters from my String
.
String s = "abc-de3-2fg";
I can use an IntStream
in order to do that:
s.stream().filter(ch -> Character.isLetter(ch)). // But then what?
What can I do in order to convert this stream back to a String
instance?
On a different note, why can't I treat a String
as a stream of objects of type Character
?
String s = "abc-de3-2fg";
// Yields a Stream of char[], therefore doesn't compile
Stream<Character> stream = Stream.of(s.toCharArray());
// Yields a stream with one member - s, which is a String object. Doesn't compile
Stream<Character> stream = Stream.of(s);
According to the javadoc, the Stream
's creation signature is as follows:
Stream.of(T... values)
The only (lousy) way that I could think of is:
String s = "abc-de3-2fg";
Stream<Character> stream = Stream.of(s.charAt(0), s.charAt(1), s.charAt(2), ...)
And of course, this isn't good enough... What am I missing?
Unfortunately such scenario is badly supported by Java 8 Stream API. My StreamEx library adds a couple of helper methods to work with such streams:
IntStreamEx.charsToString()
,IntStreamEx.codePointsToString()
andIntStreamEx.toCharArray()
. Also I introduced the primitive collectors likeIntCollector
which may help collecting the primitive streams in some non-trivial way.Here's how your task can be solved using the StreamEx library:
Or with codepoints:
Here's an answer the second part of the question. If you have an
IntStream
resulting from callingstring.chars()
you can get aStream<Character>
by casting tochar
and then boxing the result by callingmapToObj
. For example, here's how to turn aString
into aSet<Character>
:Note that casting to
char
is essential for the boxed result to beCharacter
instead ofInteger
.Now the big problem with dealing with
char
orCharacter
data is that supplementary characters are represented as surrogate pairs ofchar
values, so any algorithm with deals with individualchar
values will probably fail when presented with supplementary characters.(It may seem like supplementary characters are an obscure Unicode feature that we don't need to worry about, but as far as I know, all emoji are supplementary characters.)
Consider this example:
This will fail if presented with a string that contains the code point U+1D400 (Mathematical Bold Capital A). That code point is represented as a surrogate pair in the string, and neither value of a surrogate pair is an alphabetic character. To get the correct result, you'd need to do this instead:
I recommend always using
codePoints()
.Now, given an
IntStream
of code points, how can one reassemble it into a String? Sleiman Jneidi's answer is a reasonable one (+1), using the three-argcollect()
method ofIntStream
.Here's an alternative:
This might be a bit more flexible, in cases where you already have a
StringBuilder
that you're using to accumulate string data. You don't have to create a newStringBuilder
each time, nor do you have to convert it to aString
afterwards.The method
chars
returns anIntStream
. You just missing the collector