I had an interview question along these lines:
Given two lists of unordered customers, return a list of the intersection of the two lists. That is, return a list of the customers that appear in both lists.
Some things I established:
- Assume each customer has a unique name
- If the name is the same in both lists, it's the same customer
- The names are of the form first name last name
- There's no trickery of II's, Jr's, weird characters, etc.
I think the point was to find an efficient algorithm/use of data structures to do this as efficiently as possible.
My progress went like this:
- Read one list in to memory, then read the other list one item at a time to see if there is a match
- Alphabetize both lists then start at the top of one list and see if each item appears in the other list
- Put both lists into ordered lists, then use the shorter list to check item by item (that way, it one list has 2 items, you only check those 2 items)
- Put one list into a hash, and check for the existence of keys from the other list
The interviewer kept asking, "What next?", so I assume I'm missing something else.
Any other tricks to do this efficiently?
Side note, this question was in python, and I just read about sets
, which seem to do this as efficiently as possible. Any idea what the data structure/algorithm of sets
is?
It really doesnt matter how its implemented ... but I believe it is implemented in C so it is faster and better
set([1,2,3,4,5,6]).intersection([1,2,5,9])
is likely what they wantedIn python readability counts for alot! and set operations in python are used extensively and well vetted...
that said another pythonic way of doing it would be
or
basically I believe they were testing if you were familliar with python, not if you could implement the algorithm. since they asked a question that is so well suited to python
The benefit of this approach (besides letting you use a semi-obscure data structure correctly in an interview) is that it doesn't require any O(n) storage until after you have (with high probability) reduced the problem size.
Maybe they would just keep asking that until you run out of answers.
http://code.google.com/p/python-bloom-filter/ is a python implementation of bloom filters.