I've recently been learning about Core Data and specifically how to do inserts with a large number of objects. After learning how to do this and solving a memory leak problem that I met, I wrote the Q&A Memory leak with large Core Data batch insert in Swift.
After changing NSManagedObjectContext
from a class property to a local variable and also saving inserts in batches rather than one at a time, it worked a lot better. The memory problem cleared up and the speed improved.
The code I posted in my answer was
let batchSize = 1000
// do some sort of loop for each batch of data to insert
while (thereAreStillMoreObjectsToAdd) {
// get the Managed Object Context
let managedObjectContext = (UIApplication.sharedApplication().delegate as! AppDelegate).managedObjectContext
managedObjectContext.undoManager = nil // if you don't need to undo anything
// get the next 1000 or so data items that you want to insert
let array = nextBatch(batchSize) // your own implementation
// insert everything in this batch
for item in array {
// parse the array item or do whatever you need to get the entity attributes for the new object you are going to insert
// ...
// insert the new object
let newObject = NSEntityDescription.insertNewObjectForEntityForName("MyEntity", inManagedObjectContext: managedObjectContext) as! MyManagedObject
newObject.attribute1 = item.whatever
newObject.attribute2 = item.whoever
newObject.attribute3 = item.whenever
}
// save the context
do {
try managedObjectContext.save()
} catch {
print(error)
}
}
This method seems to be working well for me. The reason I am asking a question here, though, is two people (who know a lot more about iOS than I do) made comments that I don't understand.
@Mundi said:
It seems in your code you are using the same managed object context,
not a new one.
@MartinR also said:
... the "usual" implementation is a lazy property which creates the
context once for the lifetime of the app. In that case you are reusing
the same context as Mundi said.
Now I don't understand. Are they saying I am using the same managed object context or I should use the same managed object context? If I am using the same one, how is it that I create a new one on each while
loop? Or if I should be using just one global context, how do I do it without causing memory leaks?
Previously, I had declared the context in my View Controller, initialized it in viewDidLoad
, passed it as a parameter to my utility class doing the inserts, and just used it for everything. After discovering the big memory leak is when I started just creating the context locally.
One of the other reasons I started creating the contexts locally is because the documentation said:
First, you should typically create a separate managed object context
for the import, and set its undo manager to nil. (Contexts are not
particularly expensive to create, so if you cache your persistent
store coordinator you can use different contexts for different working
sets or distinct operations.)
What is the standard way to use NSManagedObjectContext
?
Now I don't understand. Are they saying I am using the same managed
object context or I should use the same managed object context? If I
am using the same one, how is it that I create a new one on each while
loop? Or if I should be using just one global context, how do I do it
without causing memory leaks?
Let's look at the first part of your code...
while (thereAreStillMoreObjectsToAdd) {
let managedObjectContext = (UIApplication.sharedApplication().delegate as! AppDelegate).managedObjectContext
managedObjectContext.undoManager = nil
Now, since it appears you are keeping your MOC in the App Delegate, it's likely that you are using the template-generated Core Data access code. Even if you are not, it is highly unlikely that your managedObjectContext
access method is returning a new MOC each time it is called.
Your managedObjectContext
variable is merely a reference to the MOC that is living in the App Delegate. Thus, each time through the loop, you are merely making a copy of the reference. The object being referenced is the exact same object each time through the loop.
Thus, I think they are saying that you are not using separate contexts, and I think they are right. Instead, you are using a new reference to the same context each time through the loop.
Now, your next set of questions have to do with performance. Your other post references some good content. Go back and look at it again.
What they are saying is that if you want to do a big import, you should create a separate context, specifically for the import (Objective C since I have not yet made time to learn Swift).
NSManagedObjectContext moc = [[NSManagedObjectContext alloc]
initWithConcurrencyType:NSPrivateQueueConcurrencyType];
You would then attach that MOC to the Persistent Store Coordinator. Using performBlock
you would then, in a separate thread, import your objects.
The batching concept is correct. You should keep that. However, you should wrap each batch in an auto release pool. I know you can do it in swift... I'm just not sure if this is the exact syntax, but I think it's close...
autoreleasepool {
for item in array {
let newObject = NSEntityDescription.insertNewObjectForEntityForName ...
newObject.attribute1 = item.whatever
newObject.attribute2 = item.whoever
newObject.attribute3 = item.whenever
}
}
In pseudo-code, it would all look something like this...
moc = createNewMOCWithPrivateQueueConcurrencyAndAttachDirectlyToPSC()
moc.performBlock {
while(true) {
autoreleasepool {
objects = getNextBatchOfObjects()
if (!objects) { break }
foreach (obj : objects) {
insertObjectIntoMoc(obj, moc)
}
}
moc.save()
moc.reset()
}
}
If someone wants to turn that pseudo-code into swift, it's fine by me.
The autorelease pool ensures that any objects autoreleased as a result of creating your new objects are released at the end of each batch. Once the objects are released, the MOC should have the only reference to objects in the MOC, and once the save happens, the MOC should be empty.
The trick is to make sure that all object created as part of the batch (including those representing the imported data and the managed objects themselves) are all created inside the autorelease pool.
If you do other stuff, like fetching to check for duplicates, or have complex relationships, then it is possible that the MOC may not be entirely empty.
Thus, you may want to add the swift equivalent of [moc reset]
after the save to ensure that the MOC is indeed empty.
This is a supplemental answer to @JodyHagins' answer. I am providing a Swift implementation of the pseudocode that was provided there.
let managedObjectContext = NSManagedObjectContext(concurrencyType: NSManagedObjectContextConcurrencyType.PrivateQueueConcurrencyType)
managedObjectContext.persistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator // or wherever your coordinator is
managedObjectContext.performBlock { // runs asynchronously
while(true) { // loop through each batch of inserts
autoreleasepool {
let array: Array<MyManagedObject>? = getNextBatchOfObjects()
if array == nil { break }
for item in array! {
let newEntityObject = NSEntityDescription.insertNewObjectForEntityForName("MyEntity", inManagedObjectContext: managedObjectContext) as! MyManagedObject
newObject.attribute1 = item.whatever
newObject.attribute2 = item.whoever
newObject.attribute3 = item.whenever
}
}
// only save once per batch insert
do {
try managedObjectContext.save()
} catch {
print(error)
}
managedObjectContext.reset()
}
}
These are some more resources that helped me to further understand how the Core Data stack works:
- Core Data Stack in Swift – Demystified
- My Core Data Stack