I do have a simialar problem like descripted here. But with two differences first I do use the stream api and second I do have an equals()
and hashCode()
method already. But within the stream the equalitity of the of Blogs are in this context not the same as defined in the Blog
class.
Collection<Blog> elements = x.stream()
... // a lot of filter and map stuff
.peek(p -> sysout(p)) // a stream of Blog
.? // how to remove duplicates - .distinct() doesn't work
I do have a class with an equal Method lets call it ContextBlogEqual
with the method
public boolean equal(Blog a, Blog b);
Is there any way removing all duplicate entries with my current stream approach based on the ContextBlogEqual#equal
method?
I thought already on grouping, but this doesn't work either, because the reason why blogA
and blogB
is equal isn't just one parameter. Also I have no idea how I could use .reduce(..), because there is useally more than one element left.
In essence, you either have to define hashCode
to make your data work with a hashtable, or a total order to make it work with a binary search tree.
For hashtables you'll need to declare a wrapper class which will override equals
and hashCode
.
For binary trees you can define a Comparator<Blog>
which respects your equality definition and adds an arbitrary, but consistent, ordering criterion. Then you can collect into a new TreeSet<Blog>(yourComparator)
.
First, please note that equal(Blog, Blog)
method is not enough for the most scenarios as you will need to pairwise compare all the entries which is not efficient. It's better to define the function which extracts new key from the blog entry. For example, let's consider the following Blog
class:
static class Blog {
final String name;
final int id;
final long time;
public Blog(String name, int id, long time) {
this.name = name;
this.id = id;
this.time = time;
}
@Override
public int hashCode() {
return Objects.hash(name, id, time);
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null || getClass() != obj.getClass())
return false;
Blog other = (Blog) obj;
return id == other.id && time == other.time && Objects.equals(name, other.name);
}
public String toString() {
return name+":"+id+":"+time;
}
}
Let's have some test data:
List<Blog> blogs = Arrays.asList(new Blog("foo", 1, 1234),
new Blog("bar", 2, 1345), new Blog("foo", 1, 1345),
new Blog("bar", 2, 1345));
List<Blog> distinctBlogs = blogs.stream().distinct().collect(Collectors.toList());
System.out.println(distinctBlogs);
Here distinctBlogs
contains three entries: [foo:1:1234, bar:2:1345, foo:1:1345]
. Suppose that it's undesired, because we don't want to compare the time
field. The simplest way to create new key is to use Arrays.asList
:
Function<Blog, Object> keyExtractor = b -> Arrays.asList(b.name, b.id);
The resulting keys already have proper equals
and hashCode
implementations.
Now if you fine with terminal operation, you may create a custom collector like this:
List<Blog> distinctByNameId = blogs.stream().collect(
Collectors.collectingAndThen(Collectors.toMap(
keyExtractor, Function.identity(),
(a, b) -> a, LinkedHashMap::new),
map -> new ArrayList<>(map.values())));
System.out.println(distinctByNameId);
Here we use keyExtractor
to generate the keys and merge function is (a, b) -> a
which means select the previously added entry when repeating key appears. We use LinkedHashMap
to preserve the order (omit this parameter if you don't care about order). Finally we dump the map values into the new ArrayList
. You can move such collector creation to the separate method and generalize it:
public static <T> Collector<T, ?, List<T>> distinctBy(
Function<? super T, ?> keyExtractor) {
return Collectors.collectingAndThen(
Collectors.toMap(keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new),
map -> new ArrayList<>(map.values()));
}
This way the usage will be simpler:
List<Blog> distinctByNameId = blogs.stream()
.collect(distinctBy(b -> Arrays.asList(b.name, b.id)));
Essentially, you'll need a helper method like this one:
static <T, U> Stream<T> distinct(
Stream<T> stream,
Function<? super T, ? extends U> keyExtractor
) {
final Map<U, String> seen = new ConcurrentHashMap<>();
return stream.filter(t -> seen.put(keyExtractor.apply(t), "") == null);
}
It takes a Stream
, and returns a new Stream
that contains only distinct values given the keyExtractor
. An example:
class O {
final int i;
O(int i) {
this.i = i;
}
@Override
public String toString() {
return "O(" + i + ")";
}
}
distinct(Stream.of(new O(1), new O(1), new O(2)), o -> o.i)
.forEach(System.out::println);
This yields
O(1)
O(2)
Disclaimer
As commented by Tagir Valeev here and in this similar answer by Stuart Marks, this approach has flaws. The operation as implemented here...
- is unstable for ordered parallel streams
- is not optimal for sequential streams
- violates the stateless predicate constraint on
Stream.filter()
Wrapping the above in your own library
You can of course extend Stream
with your own functionality and implement this new distinct()
function in there, e.g. like jOOλ
or Javaslang do:
Seq.of(new O(1), new O(1), new O(2))
.distinct(o -> o.i)
.forEach(System.out::println);