I have a function with type below:
union :: a -> a -> a
And a
has additivity property. So we can regard union
as a version of (+)
Say, we have [a]
, and want to perform a parallel "folding"
, for non-parallel foldling we can do only:
foldl1' union [a]
But how to perform it in parallel?
I can demonstrate problem on Num
values and (+)
function.
For example, we have a list [1,2,3,4,5,6]
and (+)
In parallel we should split
[1,2,3] (+) [4,5,6]
[1,2] (+) [3] (+) [4,5] (+) [6]
([1] (+) [2]) (+) ([3] (+) [4]) (+) ([5] (+) [6])
then each (+)
operation we want to perform in parallel, and combine to answer
[3] (+) [7] (+) [11] = 21
Note, that we split list, or perform operations in any order, because of a
additivity.
Is there any ways to do that using any standard library?
You need to generalize your
union
to any associative binary operator ⊕ such that (a ⊕ b) ⊕ c == a ⊕ (b ⊕ c). If at the same time you even have a unit element that is neutral with respect to ⊕, you have a monoid.The important aspect of associativity is that you can arbitrarily group chunks of consecutive elements in a list, and ⊕ them in any order, since a ⊕ (b ⊕ (c ⊕ d)) == (a ⊕ b) ⊕ (c ⊕ d) - each bracket can be computed in parallel; then you'd need to "reduce" the "sums" of all brackets, and you've got your map-reduce sorted.
In order for this parallellisation to make sense, you need the chunking operation to be faster than ⊕ - otherwise, doing ⊕ sequentially is better than chunking. One such case is when you have a random access "list" - say, an array. Data.Array.Repa has plenty of parallellized folding functions.
If you are thinking of practicising to implement one yourself, you need to pick a good complex function ⊕ such that the benefit will show.
For example:
Here I deliberately use a associative operation, which is called
mappend
only locally, to show it can work for a weaker notion than a monoid - only associativity matters for parallelism; since parallelism makes sense only for non-empty lists anyway, no need formempty
.Gives 8.78 seconds total run time, whereas
Gives 5.89 seconds total run time on my dual core laptop. Obviously, no point trying more than -N2 on this machine.
What you've described is essentially a monoid. In GHCI:
As you can see a monoid has three associated functions:
The
mempty
function is sort of like the identity function of the monoid. For example aNum
can behave as a monoid apropos two operations: sum and product. For a summempty
is defined as0
. For a productmempty
is defined as1
.The
mappend
function is similar to yourunion
function. For exampe for a sum ofNum
smappend
is defined as(+)
and for a product ofNum
smappend
is defined as(*)
.The
mconcat
function is similar to a fold. However because of the properties of a monoid it doesn't matter whether we fold from the left, fold from the right or fold from an arbitrary position. This is becausemappend
is supposed to be associative:Note however that Haskell doesn't enforce the monoid laws. Hence if you make a type an instance of the
Monoid
typeclass then you're responsible to ensure that it satisfies the monoid laws.In your case it's difficult to understand how
union
behaves from its type signature:a -> a -> a
. Surely you can't make a type variable an instance of a typeclass. That's not allowed. You need to be more specific. What doesunion
actually do?To give you an example of how to make a type an instance of the monoid typeclass:
That's it. We don't need to define the
mconcat
function because that has a default implementation that depends uponmempty
andmappend
. Hence when we definemempty
andmappend
we getmconcat
for free.Now you can use
Sum
as follows:This is what's happening:
Sum
constructor over[1..6]
to produce[Sum 1, Sum 2, Sum 3, Sum 4, Sum 5, Sum 6]
.mconcat
which folds it toSum 21
.getSum
to extract theNum
fromSum 21
.Note however that the default implementation of
mconcat
isfoldr mappend mempty
(i.e. it's a right fold). For most cases the default implementation is sufficient. However in your case you might want to override the default implementation:Now we can create a new instance of
Monoid
as follows:We use it as follows:
Note that I defined
mempty
asunionEmpty
. I don't know what type of data theunion
function acts on. Hence I don't know whatmempty
should be defined as. Thus I'm simply calling itunionEmpty
. Define it as you see fit.