Time complexity of stream filter

2019-04-07 09:37发布

问题:

I have a code like this:

List<Listing> Listings = new ArrayList<>();
Listings.add(listing1);
Listings.add(listing2);
...
...
...

Listing listing= listings.stream()
                .filter(l -> l.getVin() == 456)
                .findFirst();

My question is what is the time complexity of the filter process? If it is O(n), my intuition is to convert it into HashSet like data structures so that the time complexity could become O(1), Is there an elegant way to do this with streams?

回答1:

It is O(n). The stream filtering uses iteration internally.

You could convert it to a map as follows:

Map<Integer, Listing > mapOfVinToListing = listings.stream().collect(Collectors.toMap(Listing::getVin, Functions.identity()); // Assuming vin is unique per listing
mapOfVinToListing.get(456);// O(1)

But, that conversion process is also O(n). So, if you only need to do this once, use the filter. If you need to query the same list many times, then converting it to a map may make sense.

You might also try using parallel streams. In some cases they may be more performant, but that depends a lot on the exact circumstances.



回答2:

The worst case is O(n) but since Stream is lazy, if the value is found before, it'll stop the iteration. If you need constant time look up, all the time, converting to a Map is a good idea, at the cost of additional space; if the list if huge, you should consider that aspect. In fact, if the list is small, the difference between a Map and a List will be barely noticeable, unless you're working in a time-critical system.



回答3:

filter itself without a terminal operation would have a zero overhead - as it does absolutely nothing; streams are driven by the terminal operation only - no terminal operation, nothing gets executed.

Then comes the case that filter has to iterate over all elements (potentially all) of the source (lazily). So time complexity of filter will depend on the source that you Stream from; in your case List, so it would be O(n).

But that would be the worst case. You can't predicate the average case as far as I can see for filter in general because it depends on the underlying source.