I had an interview question along these lines:
Given two lists of unordered customers, return a list of the intersection of the two lists. That is, return a list of the customers that appear in both lists.
Some things I established:
- Assume each customer has a unique name
- If the name is the same in both lists, it's the same customer
- The names are of the form first name last name
- There's no trickery of II's, Jr's, weird characters, etc.
I think the point was to find an efficient algorithm/use of data structures to do this as efficiently as possible.
My progress went like this:
- Read one list in to memory, then read the other list one item at a time to see if there is a match
- Alphabetize both lists then start at the top of one list and see if each item appears in the other list
- Put both lists into ordered lists, then use the shorter list to check item by item (that way, it one list has 2 items, you only check those 2 items)
- Put one list into a hash, and check for the existence of keys from the other list
The interviewer kept asking, "What next?", so I assume I'm missing something else.
Any other tricks to do this efficiently?
Side note, this question was in python, and I just read about sets
, which seem to do this as efficiently as possible. Any idea what the data structure/algorithm of sets
is?