Indexing into containers: the mathematical underpi

2020-05-25 18:35发布

问题:

When you want to pull an element out of a data structure, you have to give its index. But the meaning of index depends on the data structure itself.

class Indexed f where
    type Ix f
    (!) :: f a -> Ix f -> Maybe a  -- indices can be out of bounds

For example...

Elements in a list have numeric positions.

data Nat = Z | S Nat
instance Indexed [] where
    type Ix [] = Nat
    [] ! _ = Nothing
    (x:_) ! Z = Just x
    (_:xs) ! (S n) = xs ! n

Elements in a binary tree are identified by a sequence of directions.

data Tree a = Leaf | Node (Tree a) a (Tree a)
data TreeIx = Stop | GoL TreeIx | GoR TreeIx  -- equivalently [Bool]
instance Indexed Tree where
    type Ix Tree = TreeIx
    Leaf ! _ = Nothing
    Node l x r ! Stop = Just x
    Node l x r ! GoL i = l ! i
    Node l x r ! GoR j = r ! j

Looking for something in a rose tree entails stepping down the levels one at a time by selecting a tree from the forest at each level.

data Rose a = Rose a [Rose a]  -- I don't even like rosé
data RoseIx = Top | Down Nat RoseIx  -- equivalently [Nat]
instance Indexed Rose where
    type Ix Rose = RoseIx
    Rose x ts ! Top = Just x
    Rose x ts ! Down i j = ts ! i >>= (! j)

It seems that the index of a product type is a sum (telling you which arm of the product to look at), the index of an element is the unit type, and the index of a nested type is a product (telling you where to look in the nested type). Sums seem to be the only one which aren't somehow linked to the derivative. The index of a sum is also a sum - it tells you which part of the sum the user is hoping to find, and if that expectation is violated you're left with a handful of Nothing.

In fact I had some success implementing ! generically for functors defined as the fixed point of a polynomial bifunctor. I won't go into detail, but Fix f can be made an instance of Indexed when f is an instance of Indexed2...

class Indexed2 f where
    type IxA f
    type IxB f
    ixA :: f a b -> IxA f -> Maybe a
    ixB :: f a b -> IxB f -> Maybe b

... and it turns out you can define an instance of Indexed2 for each of the bifunctor building blocks.

But what's really going on? What is the underlying relationship between a functor and its index? How does it relate to the functor's derivative? Does one need to understand the theory of containers (which I don't, really) to answer this question?

回答1:

It seems like the index into the type is an index into the set of constructors, following by an index into the product representing that constructor. This can be implemented quite naturally with e.g. generics-sop.

First you need a datatype to represent possible indices into a single element of the product. This could be an index pointing to an element of type a, or an index pointing to something of type g b - which requires an index pointing into g and an index pointing to an element of type a in b. This is encoded with the following type:

import Generics.SOP

data ArgIx f x x' where 
  Here :: ArgIx f x x 
  There :: (Generic (g x')) => Ix g -> ArgIx f x x' -> ArgIx f x (g x') 

newtype Ix f = ...

The index itself is just a sum (implemented by NS for n-ary sum) of sums over the generic representation of the type (choice of constructor, choice of constructor element):

newtype Ix f = MkIx (forall x . NS (NS (ArgIx f x)) (Code (f x)))

You can write smart constructors for various indices:

listIx :: Natural -> Ix [] 
listIx 0 = MkIx $ S $ Z $ Z Here 
listIx k = MkIx $ S $ Z $ S $ Z $ There (listIx (k-1)) Here  

treeIx :: [Bool] -> Ix Tree 
treeIx [] = MkIx $ S $ Z $ S $ Z Here 
treeIx (b:bs) = 
  case b of 
    True -> MkIx $ S $ Z $ Z $ There (treeIx bs) Here 
    False -> MkIx $ S $ Z $ S $ S $ Z $ There (treeIx bs) Here 

roseIx :: [Natural] -> Ix Rose 
roseIx [] = MkIx $ Z $ Z Here  
roseIx (k:ks) = MkIx $ Z $ S $ Z $ There (listIx k) (There (roseIx ks) Here)

Note that e.g. in the list case, you cannot construct an (non-bottom) index pointing to the [] constructor - likewise for Tree and Empty, or constructors containing values whose type is not a or something containing some values of type a. The quantification in MkIx prevents the construction bad things like an index pointing to the first Int in data X x = X Int x where x is instantiated to Int.

The implementation of the index function is fairly straightforward, even if the types are scary:

(!) :: (Generic (f x)) => f x -> Ix f -> Maybe x 
(!) arg (MkIx ix) = go (unSOP $ from arg) ix where 

  atIx :: a -> ArgIx f x a -> Maybe x 
  atIx a Here = Just a 
  atIx a (There ix0 ix1) = a ! ix0 >>= flip atIx ix1 

  go :: (All SListI xss) => NS (NP I) xss -> NS (NS (ArgIx f x)) xss -> Maybe x 
  go (Z a) (Z b) = hcollapse $ hzipWith (\(I x) -> K . atIx x) a b 
  go (S x) (S x') = go x x' 
  go Z{} S{} = Nothing 
  go S{} Z{} = Nothing 

The go function compares the constructor pointed to by the index and the actual constructor used by the type. If the constructors don't match, the indexing returns Nothing. If they do, the actual indexing is done - which is trivial in the case that the index points exactly Here, and in the case of some substructure, both indexing operations must succeed one after the other, which is handled by >>=.

And a simple test:

>map (("hello" !) . listIx) [0..5]
[Just 'h',Just 'e',Just 'l',Just 'l',Just 'o',Nothing]