可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to define a function which will remove duplicates from a list. So far I have a working implementation:
rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) | x `elem` xs = rmdups xs
| otherwise = x : rmdups xs
However I'd like to rework this without using elem
. What would be the best method for this?
I'd like to do this using my own function and not nub
or nubBy
.
回答1:
I don't think you'll be able to do it without elem
(or your own re-implementation of it).
However, there is a semantic issue with your implementation. When elements are duplicated you're keeping the last one. Personally, I'd expect it to keep the first duplicate item and drop the rest.
*Main> rmdups "abacd"
"bacd"
The solution is to thread the 'seen' elements through as a state variable.
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
where rdHelper seen [] = seen
rdHelper seen (x:xs)
| x `elem` seen = rdHelper seen xs
| otherwise = rdHelper (seen ++ [x]) xs
This is more-or-less how nub
is implemented in the standard library (read the source here). The small difference in nub
's implementation ensures that it is non-strict, while removeDuplicates
above is strict (it consumes the entire list before returning).
Primitive recursion is actually overkill here, if you're not worried about strictness. removeDuplicates
can be implemented in one line with foldl
:
removeDuplicates2 = foldl (\seen x -> if x `elem` seen
then seen
else seen ++ [x]) []
回答2:
Both your code and nub
have O(N^2)
complexity.
You can improve the complexity to O(N log N)
and avoid using elem
by sorting, grouping, and taking only the first element of each group.
Conceptually,
rmdups :: (Ord a) => [a] -> [a]
rmdups = map head . group . sort
Suppose you start with the list [1, 2, 1, 3, 2, 4]
. By sorting it, you get, [1, 1, 2, 2, 3, 4]
; by grouping that, you get, [[1, 1], [2, 2], [3], [4]]
; finally, by taking the head of each list, you get [1, 2, 3, 4]
.
The full implementation of the above just involves expanding each function.
Note that this requires the stronger Ord
constraint on the elements of the list, and also changes their order in the returned list.
回答3:
Even easier.
import Data.Set
mkUniq :: Ord a => [a] -> [a]
mkUniq = toList . fromList
Convert the set to a list of elements in O(n) time:
toList :: Set a -> [a]
Create a set from a list of elements in O(n log n) time:
fromList :: Ord a => [a] -> Set a
In python it would be no different.
def mkUniq(x):
return list(set(x)))
回答4:
Same as @scvalex's solution the following has an O(n * log n)
complexity and an Ord
dependency. In difference to it, it preserves the order, keeping the first occurences of items.
import qualified Data.Set as Set
rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
rmdups' _ [] = []
rmdups' a (b : c) = if Set.member b a
then rmdups' a c
else b : rmdups' (Set.insert b a) c
Benchmark results
As you can see, the benchmark results prove this solution to be the most effective.
You can find the source of this benchmark here.
回答5:
Using recursion-schemes:
import Data.Functor.Foldable
dedup :: (Eq a) => [a] -> [a]
dedup = para pseudoalgebra
where pseudoalgebra Nil = []
pseudoalgebra (Cons x (past, xs)) = if x `elem` past then xs else x:xs
While this is certainly more advanced, I think it is quite elegant and shows off some worthwhile functional programming paradigms.
回答6:
It is too late to answer this question but I want to share my solution which is original without using elem
and don't assume Ord
.
rmdups' :: (Eq a) => [a] -> [a]
rmdups' [] = []
rmdups' [x] = [x]
rmdups' (x:xs) = x : [ k | k <- rmdups'(xs), k /=x ]
This solution removes duplicates in the end of input, while question implementation deletes in the beginning. For example,
rmdups "maximum-minimum"
-- "ax-nium"
rmdups' "maximum-minimum"
-- ""maxiu-n"
Also, this code complexity is O(N*K) where N is the length of string and K is the number of unique characters in the string. N >= K thus, it will be O(N^2) in worst-case but this means that there is no repetition in the string and this is unlike since you try to delete duplicates in the string.
回答7:
Graham Hutton has a rmdups
function on p. 86 of Programming in Haskell. It preserves order. It is as follows.
rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : filter (/= x) (rmdups xs)
rmdups "maximum-minimum"
"maxiu-n"
This was bothering me until I saw Hutton's function. Then, I tried, again. There are two versions, The first keeps the last duplicate, the second keeps the first.
rmdups ls = [d|(z,d)<- zip [0..] ls, notElem d $ take z ls]
rmdups "maximum-minimum"
"maxiu-n"
If you want to take the first and not the last duplicate elements of the list, as you are trying to do, just change take
to drop
in the function and change the enumeration zip [0..]
to zip [1..]
.
回答8:
You can use this compress function too.
cmprs ::Eq a=>[a] -> [a]
--cmprs [] = [] --not necessary
cmprs (a:as)
|length as == 1 = as
|a == (head as) = cmprs as
|otherwise = [a]++cmprs as
回答9:
...or by using the function union from Data.List applied to itself:
import Data.List
unique x = union x x