For example, given the list ['one', 'two', 'one']
, the algorithm should return True
, whereas given ['one', 'two', 'three']
it should return False
.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Use
set()
to remove duplicates if all values are hashable:If you are fond of functional programming style, here is a useful function, self-documented and tested code using doctest.
From there you can test unicity by checking whether the second element of the returned pair is empty:
Note that this is not efficient since you are explicitly constructing the decomposition. But along the line of using reduce, you can come up to something equivalent (but slightly less efficient) to answer 5:
This is old, but the answers here led me to a slightly different solution. If you are up for abusing comprehensions, you can get short-circuiting this way.
I recently answered a related question to establish all the duplicates in a list, using a generator. It has the advantage that if used just to establish 'if there is a duplicate' then you just need to get the first item and the rest can be ignored, which is the ultimate shortcut.
This is an interesting set based approach I adapted straight from moooeeeep:
Accordingly, a full list of dupes would be
list(getDupes(etc))
. To simply test "if" there is a dupe, it should be wrapped as follows:This scales well and provides consistent operating times wherever the dupe is in the list -- I tested with lists of up to 1m entries. If you know something about the data, specifically, that dupes are likely to show up in the first half, or other things that let you skew your requirements, like needing to get the actual dupes, then there are a couple of really alternative dupe locators that might outperform. The two I recommend are...
Simple dict based approach, very readable:
Leverage itertools (essentially an ifilter/izip/tee) on the sorted list, very efficient if you are getting all the dupes though not as quick to get just the first:
These were the top performers from the approaches I tried for the full dupe list, with the first dupe occurring anywhere in a 1m element list from the start to the middle. It was surprising how little overhead the sort step added. Your mileage may vary, but here are my specific timed results:
I found this to do the best performance because it short-circuit the operation when the first duplicated it found, then this algorithm has time and space complexity O(n) where n is the list's length:
Another way of doing this succinctly is with Counter.
To just determine if there are any duplicates in the original list:
Or to get a list of items that have duplicates: