Sets, Functors and Eq confusion

2019-01-16 08:32发布

问题:

A discussion came up at work recently about Sets, which in Scala support the zip method and how this can lead to bugs, e.g.

scala> val words = Set("one", "two", "three")
scala> words zip (words map (_.length))
res1: Set[(java.lang.String, Int)] = Set((one,3), (two,5))

I think it's pretty clear that Sets shouldn't support a zip operation, since the elements are not ordered. However, it was suggested that the problem is that Set isn't really a functor, and shouldn't have a map method. Certainly, you can get yourself into trouble by mapping over a set. Switching to Haskell now,

data AlwaysEqual a = Wrap { unWrap :: a }

instance Eq (AlwaysEqual a) where
    _ == _ = True

instance Ord (AlwaysEqual a) where
    compare _ _ = EQ

and now in ghci

ghci> import Data.Set as Set
ghci> let nums = Set.fromList [1, 2, 3]
ghci> Set.map unWrap $ Set.map Wrap $ nums
fromList [3]
ghci> Set.map (unWrap . Wrap) nums
fromList [1, 2, 3]

So Set fails to satisfy the functor law

    fmap f . fmap g = fmap (f . g)

It can be argued that this is not a failing of the map operation on Sets, but a failing of the Eq instance that we defined, because it doesn't respect the substitution law, namely that for two instances of Eq on A and B and a mapping f : A -> B then

    if x == y (on A) then f x == f y (on B)

which doesn't hold for AlwaysEqual (e.g. consider f = unWrap).

Is the substition law a sensible law for the Eq type that we should try to respect? Certainly, other equality laws are respected by our AlwaysEqual type (symmetry, transitivity and reflexivity are trivially satisfied) so substitution is the only place that we can get into trouble.

To me, substition seems like a very desirable property for the Eq class. On the other hand, some comments on a recent Reddit discussion include

"Substitution seems stronger than necessary, and is basically quotienting the type, putting requirements on every function using the type."

-- godofpumpkins

"I also really don't want substitution/congruence since there are many legitimate uses for values which we want to equate but are somehow distinguishable."

-- sclv

"Substitution only holds for structural equality, but nothing insists Eq is structural."

-- edwardkmett

These three are all pretty well known in the Haskell community, so I'd be hesitant to go against them and insist on substitability for my Eq types!

Another argument against Set being a Functor - it is widely accepted that being a Functor allows you to transform the "elements" of a "collection" while preserving the shape. For example, this quote on the Haskell wiki (note that Traversable is a generalization of Functor)

"Where Foldable gives you the ability to go through the structure processing the elements but throwing away the shape, Traversable allows you to do that whilst preserving the shape and, e.g., putting new values in."

"Traversable is about preserving the structure exactly as-is."

and in Real World Haskell

"...[A] functor must preserve shape. The structure of a collection should not be affected by a functor; only the values that it contains should change."

Clearly, any functor instance for Set has the possibility to change the shape, by reducing the number of elements in the set.

But it seems as though Sets really should be functors (ignoring the Ord requirement for the moment - I see that as an artificial restriction imposed by our desire to work efficiently with sets, not an absolute requirement for any set. For example, sets of functions are a perfectly sensible thing to consider. In any case, Oleg has shown how to write efficient Functor and Monad instances for Set that don't require an Ord constraint). There are just too many nice uses for them (the same is true for the non-existant Monad instance).

Can anyone clear up this mess? Should Set be a Functor? If so, what does one do about the potential for breaking the Functor laws? What should the laws for Eq be, and how do they interact with the laws for Functor and the Set instance in particular?

回答1:

Another argument against Set being a Functor - it is widely accepted that being a Functor allows you to transform the "elements" of a "collection" while preserving the shape. [...] Clearly, any functor instance for Set has the possibility to change the shape, by reducing the number of elements in the set.

I'm afraid that this is a case of taking the "shape" analogy as a defining condition when it is not. Mathematically speaking, there is such a thing as the power set functor. From Wikipedia:

Power sets: The power set functor P : Set → Set maps each set to its power set and each function f : X → Y to the map which sends U ⊆ X to its image f(U) ⊆ Y.

The function P(f) (fmap f in the power set functor) does not preserve the size of its argument set, yet this is nonetheless a functor.

If you want an ill-considered intuitive analogy, we could say this: in a structure like a list, each element "cares" about its relationship to the other elements, and would be "offended" if a false functor were to break that relationship. But a set is the limiting case: a structure whose elements are indifferent to each other, so there is very little you can do to "offend" them; the only thing is if a false functor were to map a set that contains that element to a result that doesn't include its "voice."

(Ok, I'll shut up now...)


EDIT: I truncated the following bits when I quoted you at the top of my answer:

For example, this quote on the Haskell wiki (note that Traversable is a generalization of Functor)

"Where Foldable gives you the ability to go through the structure processing the elements but throwing away the shape, Traversable allows you to do that whilst preserving the shape and, e.g., putting new values in."

"Traversable is about preserving the structure exactly as-is."

Here's I'd remark that Traversable is a kind of specialized Functor, not a "generalization" of it. One of the key facts about any Traversable (or, actually, about Foldable, which Traversable extends) is that it requires that the elements of any structure have a linear order—you can turn any Traversable into a list of its elements (with Foldable.toList).

Another, less obvious fact about Traversable is that the following functions exist (adapted from Gibbons & Oliveira, "The Essence of the Iterator Pattern"):

-- | A "shape" is a Traversable structure with "no content," 
-- i.e., () at all locations.
type Shape t = t ()

-- | "Contents" without a shape are lists of elements.
type Contents a = [a]

shape :: Traversable t => t a -> Shape t
shape = fmap (const ())

contents :: Traversable t => t a -> Contents a
contents = Foldable.toList

-- | This function reconstructs any Traversable from its Shape and
-- Contents.  Law:
--
-- > reassemble (shape xs) (contents xs) == Just xs
--
-- See Gibbons & Oliveira for implementation.  Or do it as an exercise.
-- Hint: use the State monad...
--
reassemble :: Traversable t => Shape t -> Contents a -> Maybe (t a)

A Traversable instance for sets would violate the proposed law, because all non-empty sets would have the same Shape—the set whose Contents is [()]. From this it should be easy to prove that whenever you try to reassemble a set you would only ever get the empty set or a singleton back.

Lesson? Traversable "preserves shape" in a very specific, stronger sense than Functor does.



回答2:

Set is "just" a functor (not a Functor) from the subcategory of Hask where Eq is "nice" (i.e. the subcategory where congruence, substitution, holds). If constraint kinds were around from way back then perhaps set would be a Functor of some kind.



回答3:

Well, Set can be treated as a covariant functor, and as a contravariant functor; usually it's a covariant functor. And for it to behave regarding equality one has to make sure that whatever the implementation, it does.

Regarding Set.zip - it is nonsense. As well as Set.head (you have it in Scala). It should not exist.