-->

How to create lazy combinations

2019-01-29 02:00发布

问题:

My question is very simple, how do I make this code lazy:

/*
input: [
    [1, 2],
    [3, 4],
    [5, 6]
]

output: [
    [1, 3, 5],
    [1, 3, 6],
    [1, 4, 5],
    [1, 4, 6],
    [2, 3, 5],
    [2, 3, 6],
    [2, 4, 5],
    [2, 4, 6],
]
*/

func combinations<T>(options: [[T]]) -> [[T]] {
    guard let head = options.first else {
        return [].map({ [$0] })
    }

    if options.count == 1 {
        return head.map({ [$0] })
    }

    let tailCombinations = combinations(options: Array(options.dropFirst()))

    return head.flatMap({ option in
        return tailCombinations.map({ combination -> [T] in
            return [option] + combination
        })
    })
}

The code above works to calculate the combinations, but it does so creating the entire array of arrays in memory. What I need is to have it return something like LazySequence<Array<T>>, except the Swift type system doesn't let me do something that generic.

Any ideas how to achieve this and keep the functional style?

Ps.: I did think of another way to solve this problem with generators and keeping track of indexes, but I don't wanna keep track of any state, I want a pure functional (as in FP) solution. Haskell does it by default, btw, and I'm looking for the same thing.

EDIT: I've managed to solve part of the problem, the type system, with AnyCollection

func combinations<T>(options: [[T]]) -> LazyCollection<AnyCollection<[T]>> {
    guard let head = options.first else {
        return AnyCollection([].lazy.map({ [$0] })).lazy
    }

    if options.count == 1 {
        return AnyCollection(head.lazy.map({ [$0] })).lazy
    }

    let tailCombinations = combinations(options: Array(options.dropFirst()))

    return AnyCollection(head.lazy.flatMap({ option in
        return tailCombinations.lazy.map({ [option] + $0 })
    })).lazy
}

But when I use the function, it loads the entire collection in memory, i.e., not lazy.

EDIT 2: Doing some more investigation, turns out the problem is with AnyCollection

// stays lazy
let x1 = head.lazy.flatMap({ option in
    return tailCombinations.lazy.map({ [option] + $0 })
})

// forces to load in memory
let x2 = AnyCollection(head.lazy.flatMap({ option in
    return tailCombinations.lazy.map({ [option] + $0 })
}))

Not sure how to solve this yet.

回答1:

Here is what I came up with:

func combinations<T>(options: [[T]]) -> AnySequence<[T]> {
    guard let lastOption = options.last else {
        return AnySequence(CollectionOfOne([]))
    }
    let headCombinations = combinations(options: Array(options.dropLast()))
    return AnySequence(headCombinations.lazy.flatMap { head in
        lastOption.lazy.map { head + [$0] }
    })
}

The main difference to this solution is that the recursive call creates a sequence of the first N-1 options, and then combines each element of that sequence with each element of the last option. This is more efficient because the sequence returned from the recursive call is enumerated only once, and not once for each element that it is combined with.

Other differences are:

  • There is no need to call .lazy on the AnySequence if that sequence is already lazy. The return type is therefore "simplified" to AnySequence<[T]>.
  • I have used CollectionOfOne to create a single-element sequence for the empty array.
  • Treating the case options.count == 1 separately is not necessary for the algorithm to work (but might be a possible performance improvement).

A completely different approach is to define a custom collection type which computes each combination as a function of the index, using simple modulo arithmetic:

struct Combinations<T> : RandomAccessCollection {
    let options: [[T]]
    let startIndex = 0
    let endIndex: Int

    init(options: [[T]]) {
        self.options = options.reversed()
        self.endIndex = options.reduce(1) { $0 * $1.count }
    }

    subscript(index: Int) -> [T] {
        var i = index
        var combination: [T] = []
        combination.reserveCapacity(options.count)
        options.forEach { option in
            combination.append(option[i % option.count])
            i /= option.count
        }
        return combination.reversed()
    }
}

No extra storage is needed and no recursion. Example usage:

let all = Combinations(options: [[1, 2], [3, 4], [5, 6]])
print(all.count)
for c in all { print(c) }

Output:

8
[1, 3, 5]
[1, 3, 6]
[1, 4, 5]
[1, 4, 6]
[2, 3, 5]
[2, 3, 6]
[2, 4, 5]
[2, 4, 6]

Testing with

let options = Array(repeating: [1, 2, 3, 4, 5], count: 5)

this collection-based method turned out to be faster then the my above sequence-based method by a factor of 2.



回答2:

I found one possible solution, but I'll leave this answer not accepted for a while to see if someone knows a better one.

func combinations<T>(options: [[T]]) -> LazySequence<AnySequence<[T]>> {
    guard let head = options.first else {
        return AnySequence([].lazy.map({ [$0] })).lazy
    }

    if options.count == 1 {
        return AnySequence(head.lazy.map({ [$0] })).lazy
    }

    let tailCombinations = combinations(options: Array(options.dropFirst()))

    return AnySequence(head.lazy.flatMap({ option in
        return tailCombinations.lazy.map({ [option] + $0 })
    })).lazy
}

The solution was to use AnySequence instead of AnyCollection. I'm not sure why though, I'd still like to have the AnyCollection interface rather than AnySequence, since it provides me with a few more methods, like count.