Split string in 64kb using Swift

2019-09-10 11:15发布

问题:

I would like to split an extremely large string, up to 8mb, in 64kb chunks. At the moment I am using the following code:

//1
var regData:String= "string up to 8mb"
var count=((countElements(self.regData!))/65536)

//2
for var index = 0; index < count; ++index {
    arr.append(self.regData!.substringWithRange(Range<String.Index>(start: advance(self.regData!.startIndex, 0),end: advance(self.regData!.startIndex, 65536))))
    self.regData!.removeRange(Range<String.Index>(start: self.regData!.startIndex, end:advance(self.regData!.startIndex, 65536)))
    println(index)
 }
//3
println("exit loop")
arr.append(self.regData!)
  1. I calculate how many 64 kb chunks I have.
  2. In the for loop I get the first 64kb. I collect them in an array. Now I have to delete the first 64kb strings because of step 3.
  3. If I I have less than 64kb I get an error in my loop. Therefore my last step is outside the loop.

The code works fine, but it is extremely slow. I need to speed up my code. Do you have any idea how to do it.

Thanks at all.

回答1:

It might be more effective if you don't modify the original string, and just use two indices (from and to) to traverse through the string:

let regData = "string up to 8mb"
let chunkSize = 65536

var array = [String]()
var from = regData.startIndex // start of current chunk
let end = regData.endIndex    // end of string
while from != end {
    // advance "from" by "chunkSize", but not beyond "end":
    let to = from.advancedBy(chunkSize, limit: end)
    array.append(regData.substringWithRange(from ..< to))
    from = to
}

Note that this gives substrings of 65536 characters. Since a Swift character represents a "Unicode grapheme cluster", this will not correspond to 64kB of data. If you need that then you should convert the string to NSData and split that into chunks.

(Updated for Swift 2.)