Get object with max frequency from Java 8 stream

2019-04-07 14:25发布

问题:

I have an object with city and zip fields, let's call it Record.

public class Record() {
    private String zip;
    private String city;

    //getters and setters
}

Now, I have a collection of these objects, and I group them by zip using the following code:

final Collection<Record> records; //populated collection of records
final Map<String, List<Record>> recordsByZip = records.stream()
    .collect(Collectors.groupingBy(Record::getZip));

So, now I have a map where the key is the zip and the value is a list of Record objects with that zip.

What I want to get now is the most common city for each zip.

recordsByZip.forEach((zip, records) -> {
    final String mostCommonCity = //get most common city for these records
});

I would like to do this with all stream operations. For example, I am able to get a map of the frequency for each city by doing this:

recordsByZip.forEach((zip, entries) -> {
    final Map<String, Long> frequencyMap = entries.stream()
        .map(GisSectorFileRecord::getCity)
        .filter(StringUtils::isNotBlank)
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
});

But I would like to be able to do a single-line stream operation that will just return the most frequent city.

Are there any Java 8 stream gurus out there that can work some magic on this?

Here is an ideone sandbox if you'd like to play around with it.

回答1:

You could have the following:

final Map<String, String> mostFrequentCities =
  records.stream()
         .collect(Collectors.groupingBy(
            Record::getZip,
            Collectors.collectingAndThen(
              Collectors.groupingBy(Record::getCity, Collectors.counting()),
              map -> map.entrySet().stream().max(Map.Entry.comparingByValue()).get().getKey()
            )
         ));

This groups each records by their zip, and by their cities, counting the number of cities for each zip. Then, the map of the number of cities by zip is post-processed to keep only the city having the maximum count.



回答2:

I think Multiset is a good choice for this kind of question. Here is code by AbacusUtil

Stream.of(records).map(e -> e.getCity()).filter(N::notNullOrEmpty).toMultiset().maxOccurrences().get().getKey();

Disclosure: I'm the developer of AbacusUtil.