I'm looking for a function equivalent to Groovy's collate which would partition a large List into batches for processing. I did see subList
which could be adapted into a similar function but wanted to check and make sure I wasn't missing an in-built or crazy simple alternative to rolling my own.
问题:
回答1:
NOTE: For Kotlin 1.2 and newer, please see the chunked
and windowed
functions that are now in the standard library. There is no need for a custom solution.
Here is an implementation of a lazy batching extension function which will take a collection, or anything that can become a Sequence
and return a Sequence
of List
each of that size, with the last one being that size or smaller.
Example usage to iterate a list as batches:
myList.asSequence().batch(5).forEach { group ->
// receive a Sequence of size 5 (or less for final)
}
Example to convert batches of List
to Set
:
myList.asSequence().batch(5).map { it.toSet() }
See the first test case below for showing the output given specific input.
Code for the function Sequence<T>.batch(groupSize)
:
public fun <T> Sequence<T>.batch(n: Int): Sequence<List<T>> {
return BatchingSequence(this, n)
}
private class BatchingSequence<T>(val source: Sequence<T>, val batchSize: Int) : Sequence<List<T>> {
override fun iterator(): Iterator<List<T>> = object : AbstractIterator<List<T>>() {
val iterate = if (batchSize > 0) source.iterator() else emptyList<T>().iterator()
override fun computeNext() {
if (iterate.hasNext()) setNext(iterate.asSequence().take(batchSize).toList())
else done()
}
}
}
Unit tests proving it works:
class TestGroupingStream {
@Test fun testConvertToListOfGroupsWithoutConsumingGroup() {
val listOfGroups = listOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).asSequence().batch(2).toList()
assertEquals(5, listOfGroups.size)
assertEquals(listOf(1,2), listOfGroups[0].toList())
assertEquals(listOf(3,4), listOfGroups[1].toList())
assertEquals(listOf(5,6), listOfGroups[2].toList())
assertEquals(listOf(7,8), listOfGroups[3].toList())
assertEquals(listOf(9,10), listOfGroups[4].toList())
}
@Test fun testSpecificCase() {
val originalStream = listOf(1,2,3,4,5,6,7,8,9,10)
val results = originalStream.asSequence().batch(3).map { group ->
group.toList()
}.toList()
assertEquals(listOf(1,2,3), results[0])
assertEquals(listOf(4,5,6), results[1])
assertEquals(listOf(7,8,9), results[2])
assertEquals(listOf(10), results[3])
}
fun testStream(testList: List<Int>, batchSize: Int, expectedGroups: Int) {
var groupSeenCount = 0
var itemsSeen = ArrayList<Int>()
testList.asSequence().batch(batchSize).forEach { groupStream ->
groupSeenCount++
groupStream.forEach { item ->
itemsSeen.add(item)
}
}
assertEquals(testList, itemsSeen)
assertEquals(groupSeenCount, expectedGroups)
}
@Test fun groupsOfExactSize() {
testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), 5, 3)
}
@Test fun groupsOfOddSize() {
testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18), 5, 4)
testStream(listOf(1,2,3,4), 3, 2)
}
@Test fun groupsOfLessThanBatchSize() {
testStream(listOf(1,2,3), 5, 1)
testStream(listOf(1), 5, 1)
}
@Test fun groupsOfSize1() {
testStream(listOf(1,2,3), 1, 3)
}
@Test fun groupsOfSize0() {
val testList = listOf(1,2,3)
val groupCountZero = testList.asSequence().batch(0).toList().size
assertEquals(0, groupCountZero)
val groupCountNeg = testList.asSequence().batch(-1).toList().size
assertEquals(0, groupCountNeg)
}
@Test fun emptySource() {
listOf<Int>().asSequence().batch(1).forEach { groupStream ->
fail()
}
}
}
回答2:
With Kotlin 1.2-M1, according to your needs, you may choose one of the following ways to solve your problem.
#1. Using chunked(size: Int)
fun main(args: Array<String>) {
val list = listOf(2, 4, 3, 10, 8, 7)
val newList = list.chunked(2)
//val newList = list.chunked(size = 2) // also works
print(newList)
}
/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/
#2. Using windowed(size: Int, step: Int)
fun main(args: Array<String>) {
val list = listOf(2, 4, 3, 10, 8, 7, 9)
val newList = list.windowed(2, 2)
//val newList = list.windowed(size = 2, step = 2) // also works
println(newList)
}
/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/
回答3:
A more simplistic/functional-style solution would be
val items = (1..100).map { "foo_${it}" }
fun <T> Iterable<T>.batch(chunkSize: Int) =
withIndex(). // create index value pairs
groupBy { it.index / chunkSize }. // create grouping index
map { it.value.map { it.value } } // split into different partitions
items.batch(3)
Note 1: Personally I'd prefer partition
as a method name here, but it's already present in Kotlin's stdlib to separate a lists into 2 parts given a predicate.
Note 2: The the iterator solution from Jayson may scale better than this solution for large collections.
回答4:
In Kotlin 1.2 M2 and later you can use chunked
and windowed
(see Kotlin 1.2 M2 is out | Kotlin Blog). Note that there are Sequence
variances too (see kotlin.sequences - Kotlin Programming Language).
For versions of Kotlin prior to 1.2 M2 I recommend using Lists.partition(List, int) from google-guava
(it uses java.util.List.subList(int, int)
):
If you are unfamiliar with Guava see CollectionUtilitiesExplained · google/guava Wiki for more details.
You can create your own Kotlin extension function for it if you want:
fun <T> List<T>.collate(size: Int): List<List<T>> = Lists.partition(this, size)
If you want an extension function for mutable lists then in a separate Kotlin file (to avoid platform declaration clashes):
fun <T> MutableList<T>.collate(size: Int): List<MutableList<T>> = Lists.partition(this, size)
If you want something lazy loaded like in Jayson Minard's answer you can use Iterables.partition(Iterable, int). You might also be interested in Iterables.paddedPartition(Iterable, int) if you want to pad the last sublist if it is smaller than the specified size
. These return Iterable<List<T>>
(I don't see much point in making it Iterable<Iterable<T>>
as subList
returns an efficient view).
If for some reason you don't want to depend on Guava you can roll your own pretty easily using the subList
function you mentioned:
fun <T> List<T>.collate(size: Int): List<List<T>> {
require(size > 0)
return if (isEmpty()) {
emptyList()
} else {
(0..lastIndex / size).map {
val fromIndex = it * size
val toIndex = Math.min(fromIndex + size, this.size)
subList(fromIndex, toIndex)
}
}
}
or
fun <T> List<T>.collate(size: Int): Sequence<List<T>> {
require(size > 0)
return if (isEmpty()) {
emptySequence()
} else {
(0..lastIndex / size).asSequence().map {
val fromIndex = it * size
val toIndex = Math.min(fromIndex + size, this.size)
subList(fromIndex, toIndex)
}
}
}
回答5:
There is unfortunately no built-in function for that yet and while functional and Sequence
-based implementations from other answers look nice, if you just need is List
of List
s, I'd suggest writing a little bit of ugly, imperative, but performant code.
This is my final result:
fun <T> List<T>.batch(chunkSize: Int): List<List<T>> {
if (chunkSize <= 0) {
throw IllegalArgumentException("chunkSize must be greater than 0")
}
val capacity = (this.size + chunkSize - 1) / chunkSize
val list = ArrayList<ArrayList<T>>(capacity)
for (i in 0 until this.size) {
if (i % chunkSize == 0) {
list.add(ArrayList(chunkSize))
}
list.last().add(this.get(i))
}
return list
}