Haskell: shuffling data without functional depende

2019-06-17 07:59发布

问题:

I'm trying to implement a Fisher-Yates shuffle of some data. This algorithm is easy to implement for one-dimensional arrays. However, I need to be able to shuffle data in a two-dimensional matrix.

An approach which I think could generalize nicely to higher dimensional arrays is to convert my arbitrarily dimensioned matrix to a single-dimensional array of indices, shuffle those, and then reorganize the matrix by swapping the element at each index of this index array with the element at the index of the element of the index array. In other words, to take a 2x2 matrix such as:

1  2
3  4

I would convert this into this "array":

[(0, (0,0)),  (1, (0,1)),  (2, ((1,0)),  (3, (1,1))]

This I would then scramble per normal into, say,

[(0, (1,0)),  (1, (0,1)),  (2, ((1,1)),  (3, (0,0))]

Once reorganized, the original matrix would become:

2  3
4  1

My basic approach here is that I want to have a type class that looks something like this:

class Shufflable a where
  indices    :: a -> Array Int b
  reorganize :: a -> Array Int b -> a

Then I'll have a function to perform the shuffle which looks like this:

fisherYates :: (RandomGen g) => g -> Array Int b -> (Array Int b, g)

The thinking is that (minus the RandomGen plumbing) I should be able to shuffle a shuffleable thing like so:

shuffle :: (Shufflable a, RandomGen g) => a -> g -> (a, g)
shuffle array = reorganize array (fisherYates (indices array))

Here's what I have so far:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances  #-}

module Shuffle where

import Data.Array hiding (indices)
import System.Random         

fisherYates :: (RandomGen g) => Array Int e -> g -> (Array Int e, g)
fisherYates arr gen = go max gen arr
    where
      (_, max) = bounds arr

      go 0 g arr = (arr, g)
      go i g arr = go (i-1) g' (swap arr i j)
          where
            (j, g') = randomR (0, i) g

class Shuffle a b | a -> b where
  indices    :: a -> Array Int b
  reorganize :: a -> Array Int b -> a

shuffle :: (Shuffle a b, RandomGen g) => a -> g -> (a, g)
shuffle a gen = (reorganize a indexes, gen')
  where
    (indexes, gen') = fisherYates (indices a) gen

instance (Ix ix) => Shuffle (Array ix e) ix where
  reorganize a = undefined
  indices a    = array (0, maxIdx) (zip [0..maxIdx] (range bound))
      where
        bound = bounds a
        maxIdx = rangeSize bound - 1

swap :: Ix i => Array i e -> i -> i -> Array i e
swap arr i j = arr // [ (i, i'), (j, j') ]
    where
      i' = arr!j
      j' = arr!i

My problems:

  1. I feel like this is a lot of language extensions for solving a simple problem. Would it be easier to understand or write another way?
  2. I feel like the community is moving towards type families over functional dependencies. Is there a way to use that instead to solve this problem?
  3. A part of me wonders if the fisherYates function can be moved into the Shuffle typeclass somehow. Would it be possible and/or worth doing to set this up so that you either implement shuffle or implement both indices and reorganize?

Thanks!

回答1:

You might want to look into repa, which offers n-dimensional arrays which encode their shape (dimensions) into the type; you can code generic operations that work on arrays of any shape with it.

I think you could avoid a typeclass entirely by constructing the array with backpermute or fromFunction and translating the indices (it's more efficient than it looks, since it gets turned into an unboxed array when you force it; in fact, backpermute is implemented in terms of fromFunction under the hood).

repa itself uses quite a few language extensions, but you might find it preferable to the standard library's arrays for reasons of both performance (repa's arrays are unboxed, and the standard operations offered do nice things like automatic parallelisation) and convenience (IMO repa has a nicer API than the standard arrays).

Here's a good introduction to repa.

Admittedly, none of this directly simplifies your code. But if repa's arrays are a good fit for you, then the code you end up with will probably avoid many of the complexities of your current solution.


That said, turning your use of functional dependencies into a type family is really simple; the Shuffle class becomes

class Shuffle a where
  type Elt a
  indices    :: a -> Array Int (Elt a)
  reorganize :: a -> Array Int (Elt a) -> a

the instance becomes

instance (Ix ix) => Shuffle (Array ix e) where
  type Elt (Array ix e) = ix
  ...

and the Shuffle a b constraint becomes Shuffle a.