Kotlin: Convert large List to sublist of set parti

2019-02-03 02:44发布

问题:

I'm looking for a function equivalent to Groovy's collate which would partition a large List into batches for processing. I did see subList which could be adapted into a similar function but wanted to check and make sure I wasn't missing an in-built or crazy simple alternative to rolling my own.

回答1:

NOTE: For Kotlin 1.2 and newer, please see the chunked and windowed functions that are now in the standard library. There is no need for a custom solution.


Here is an implementation of a lazy batching extension function which will take a collection, or anything that can become a Sequence and return a Sequence of List each of that size, with the last one being that size or smaller.

Example usage to iterate a list as batches:

myList.asSequence().batch(5).forEach { group ->
   // receive a Sequence of size 5 (or less for final)
}

Example to convert batches of List to Set:

myList.asSequence().batch(5).map { it.toSet() }

See the first test case below for showing the output given specific input.

Code for the function Sequence<T>.batch(groupSize):

public fun <T> Sequence<T>.batch(n: Int): Sequence<List<T>> {
    return BatchingSequence(this, n)
}

private class BatchingSequence<T>(val source: Sequence<T>, val batchSize: Int) : Sequence<List<T>> {
    override fun iterator(): Iterator<List<T>> = object : AbstractIterator<List<T>>() {
        val iterate = if (batchSize > 0) source.iterator() else emptyList<T>().iterator()
        override fun computeNext() {
            if (iterate.hasNext()) setNext(iterate.asSequence().take(batchSize).toList())
            else done() 
        }
    }
}

Unit tests proving it works:

class TestGroupingStream {

    @Test fun testConvertToListOfGroupsWithoutConsumingGroup() {
        val listOfGroups = listOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).asSequence().batch(2).toList()
        assertEquals(5, listOfGroups.size)
        assertEquals(listOf(1,2), listOfGroups[0].toList())
        assertEquals(listOf(3,4), listOfGroups[1].toList())
        assertEquals(listOf(5,6), listOfGroups[2].toList())
        assertEquals(listOf(7,8), listOfGroups[3].toList())
        assertEquals(listOf(9,10), listOfGroups[4].toList())
    }

    @Test fun testSpecificCase() {
        val originalStream = listOf(1,2,3,4,5,6,7,8,9,10)

        val results = originalStream.asSequence().batch(3).map { group ->
            group.toList()
        }.toList()

        assertEquals(listOf(1,2,3), results[0])
        assertEquals(listOf(4,5,6), results[1])
        assertEquals(listOf(7,8,9), results[2])
        assertEquals(listOf(10), results[3])
    }


    fun testStream(testList: List<Int>, batchSize: Int, expectedGroups: Int) {
        var groupSeenCount = 0
        var itemsSeen = ArrayList<Int>()

        testList.asSequence().batch(batchSize).forEach { groupStream ->
            groupSeenCount++
            groupStream.forEach { item ->
                itemsSeen.add(item)
            }
        }

        assertEquals(testList, itemsSeen)
        assertEquals(groupSeenCount, expectedGroups)
    }

    @Test fun groupsOfExactSize() {
        testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), 5, 3)
    }

    @Test fun groupsOfOddSize() {
        testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18), 5, 4)
        testStream(listOf(1,2,3,4), 3, 2)
    }

    @Test fun groupsOfLessThanBatchSize() {
        testStream(listOf(1,2,3), 5, 1)
        testStream(listOf(1), 5, 1)
    }

    @Test fun groupsOfSize1() {
        testStream(listOf(1,2,3), 1, 3)
    }

    @Test fun groupsOfSize0() {
        val testList = listOf(1,2,3)

        val groupCountZero =   testList.asSequence().batch(0).toList().size
        assertEquals(0, groupCountZero)

        val groupCountNeg =  testList.asSequence().batch(-1).toList().size
        assertEquals(0, groupCountNeg)

    }

    @Test fun emptySource() {
        listOf<Int>().asSequence().batch(1).forEach { groupStream ->
            fail()
        }

    }
}


回答2:

With Kotlin 1.2-M1, according to your needs, you may choose one of the following ways to solve your problem.


#1. Using chunked(size: Int)

fun main(args: Array<String>) {
    val list = listOf(2, 4, 3, 10, 8, 7)
    val newList = list.chunked(2)
    //val newList = list.chunked(size = 2) // also works
    print(newList)
}

/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/

#2. Using windowed(size: Int, step: Int)

fun main(args: Array<String>) {
    val list = listOf(2, 4, 3, 10, 8, 7, 9)
    val newList = list.windowed(2, 2)
    //val newList = list.windowed(size = 2, step = 2) // also works
    println(newList)
}

/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/


回答3:

A more simplistic/functional-style solution would be

val items = (1..100).map { "foo_${it}" }

fun <T> Iterable<T>.batch(chunkSize: Int) =
   withIndex().                        // create index value pairs
   groupBy { it.index / chunkSize }.   // create grouping index
   map { it.value.map { it.value } }   // split into different partitions


items.batch(3)

Note 1: Personally I'd prefer partition as a method name here, but it's already present in Kotlin's stdlib to separate a lists into 2 parts given a predicate.

Note 2: The the iterator solution from Jayson may scale better than this solution for large collections.



回答4:

In Kotlin 1.2 M2 and later you can use chunked and windowed (see Kotlin 1.2 M2 is out | Kotlin Blog). Note that there are Sequence variances too (see kotlin.sequences - Kotlin Programming Language).

For versions of Kotlin prior to 1.2 M2 I recommend using Lists.partition(List, int) from google-guava (it uses java.util.List.subList(int, int)):

If you are unfamiliar with Guava see CollectionUtilitiesExplained · google/guava Wiki for more details.

You can create your own Kotlin extension function for it if you want:

fun <T> List<T>.collate(size: Int): List<List<T>> = Lists.partition(this, size)

If you want an extension function for mutable lists then in a separate Kotlin file (to avoid platform declaration clashes):

fun <T> MutableList<T>.collate(size: Int): List<MutableList<T>> = Lists.partition(this, size)

If you want something lazy loaded like in Jayson Minard's answer you can use Iterables.partition(Iterable, int). You might also be interested in Iterables.paddedPartition(Iterable, int) if you want to pad the last sublist if it is smaller than the specified size. These return Iterable<List<T>> (I don't see much point in making it Iterable<Iterable<T>> as subList returns an efficient view).

If for some reason you don't want to depend on Guava you can roll your own pretty easily using the subList function you mentioned:

fun <T> List<T>.collate(size: Int): List<List<T>> {
    require(size > 0)
    return if (isEmpty()) {
        emptyList()
    } else {
        (0..lastIndex / size).map {
            val fromIndex = it * size
            val toIndex = Math.min(fromIndex + size, this.size)
            subList(fromIndex, toIndex)
        }
    }
}

or

fun <T> List<T>.collate(size: Int): Sequence<List<T>> {
    require(size > 0)
    return if (isEmpty()) {
        emptySequence()
    } else {
        (0..lastIndex / size).asSequence().map {
            val fromIndex = it * size
            val toIndex = Math.min(fromIndex + size, this.size)
            subList(fromIndex, toIndex)
        }
    }
}


回答5:

There is unfortunately no built-in function for that yet and while functional and Sequence-based implementations from other answers look nice, if you just need is List of Lists, I'd suggest writing a little bit of ugly, imperative, but performant code.

This is my final result:

fun <T> List<T>.batch(chunkSize: Int): List<List<T>> {
    if (chunkSize <= 0) {
        throw IllegalArgumentException("chunkSize must be greater than 0")
    }
    val capacity = (this.size + chunkSize - 1) / chunkSize
    val list = ArrayList<ArrayList<T>>(capacity)
    for (i in 0 until this.size) {
        if (i % chunkSize == 0) {
            list.add(ArrayList(chunkSize))
        }
        list.last().add(this.get(i))
    }
    return list
}


标签: kotlin