Intersection of Two Lists Of Strings

2019-04-13 07:21发布

问题:

I had an interview question along these lines:

Given two lists of unordered customers, return a list of the intersection of the two lists. That is, return a list of the customers that appear in both lists.

Some things I established:

  • Assume each customer has a unique name
  • If the name is the same in both lists, it's the same customer
  • The names are of the form first name last name
  • There's no trickery of II's, Jr's, weird characters, etc.

I think the point was to find an efficient algorithm/use of data structures to do this as efficiently as possible.

My progress went like this:

  • Read one list in to memory, then read the other list one item at a time to see if there is a match
  • Alphabetize both lists then start at the top of one list and see if each item appears in the other list
  • Put both lists into ordered lists, then use the shorter list to check item by item (that way, it one list has 2 items, you only check those 2 items)
  • Put one list into a hash, and check for the existence of keys from the other list

The interviewer kept asking, "What next?", so I assume I'm missing something else.

Any other tricks to do this efficiently?

Side note, this question was in python, and I just read about sets, which seem to do this as efficiently as possible. Any idea what the data structure/algorithm of sets is?

回答1:

  1. Put one list into a bloom filter and use that to filter the second list.
  2. Put the filtered second list into a bloom filter and use that to filter the first list.
  3. Sort the two lists and find the intersection by one of the methods above.

The benefit of this approach (besides letting you use a semi-obscure data structure correctly in an interview) is that it doesn't require any O(n) storage until after you have (with high probability) reduced the problem size.


The interviewer kept asking, "What next?", so I assume I'm missing something else.

Maybe they would just keep asking that until you run out of answers.


http://code.google.com/p/python-bloom-filter/ is a python implementation of bloom filters.



回答2:

It really doesnt matter how its implemented ... but I believe it is implemented in C so it is faster and better set([1,2,3,4,5,6]).intersection([1,2,5,9]) is likely what they wanted

In python readability counts for alot! and set operations in python are used extensively and well vetted...

that said another pythonic way of doing it would be

list_new = [itm for itm in listA if itm in listB]

or

list_new = filter(lambda itm:itm in listB,listA)

basically I believe they were testing if you were familliar with python, not if you could implement the algorithm. since they asked a question that is so well suited to python