Complexity of Set operations

2019-07-16 17:55发布

问题:

This is what I am doing:
String one = "some string"
String two = "some string"

I want to know all the characters that are in string one and two and they should come in order as they are in string one

I wrote a Java program which by using Collections performs set operations on both the collection.
What I would like to know that what is the complexity of performing set operations, is it polynomial time or linear time

My program is here

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

package careercup.google;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;

/**
 *
 * @author learner
 */
public class CharaterStringsIntersection {
    private static final String one = "abcdefgabcfmnx";
    private static final String two = "xbcg";

    public static void main(String args[]){
        List<Character> l_one = new ArrayList<Character>();
        List<Character> l_two = new ArrayList<Character>();

        for(int i=0; i<one.length(); i++){
           l_one.add(one.charAt(i));
        }
        for(int j=0; j<two.length(); j++){
            l_two.add(two.charAt(j));
        }

        l_one.retainAll(l_two);

        Iterator iter = l_one.iterator();
        while(iter.hasNext()){
            System.out.println(" > " + iter.next());
        }
    }
}

Output :

run:
 > b
 > c
 > g
 > b
 > c
 > x

回答1:

Its complexity is O(N * (N + M)), where N = one.length(), M = two.length().

It works in the following way: for each character of l_one (there are N of them) it scans l_two (since l_two is an ArrayList, scan is linear, so it takes O(M) steps, see [1]) and removes item from l_one if necessary (removing from ArrayList takes O(N), see [1]), so you get O(N * (N + M)).

You can lower the complexity to O(N * M) by using LinkedList for l_one (since removing from LinkedList is O(1), see [2]), and further lower it to O(N * log(M)) by using TreeSet for l_two (since searching in TreeSet takes O(log(M)), see [3]).

References:

  1. All of the other operations run in linear time (ArrayList javadoc)
  2. All of the operations perform as could be expected for a doubly-linked list, (LinkedList javadoc)
  3. This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains) (TreeSet javadoc)


回答2:

I think the more important question you are trying to ask is what is the complexity of retainAll() function. Assuming there are no hashes or quick look up, I would assume the time complexity is O(n^2). One loop to iterate the list and one loop to compare each item.



回答3:

You are using the wrong collection for your purposes..

Since you are doing it with plain ArrayList the complexity wouldn't be good, I don't know the inner implementation but I would guess it's quadratic (since it has to scroll both lists), this unless Java internally converts two lists to sets before doing the intersection.

You should start from Set interface (here) and choose which collection is more suitable for your needs, I think that a HashSet (constant time) or a TreeSet (logarithmic but able to keep natural ordering) would be best solutions.



回答4:

Briefly:

  • ArrayList.add() is O(1).

  • ArrayList.retainAll(other) is O(N1 * N2) where N1 is the size of this and N2 is the size of other. (The actual cost depends on the ratio of elements retained to elements removed.)

  • iter.hasNext() and iter.next() are O(1) for an ArrayList iterator.

This is all as you would expect, based on mentally modelling an ArrayList as a Java array.