Core data iterate over fetchrequest in chunks with

2019-09-18 14:04发布

问题:

I am trying to process a lot of objects in chunks of a certain size (batchSize). This loop seems to work, but it processes only half the records. Relevant piece of code is:

{
//Prepare fetching products without images in the database
NSFetchRequest * productFetchRequest = [NSFetchRequest fetchRequestWithEntityName:@"Product"];

//Sort by last changed photo first
NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"photoModificationDate" ascending:NO];
[productFetchRequest setSortDescriptors:@[sortDescriptor]];

NSPredicate *predicate = [NSPredicate predicateWithFormat: predicateString];
[productFetchRequest setPredicate:predicate];

//First get the total count
NSUInteger numberOfProducts = [self.backgroundMOC countForFetchRequest: productFetchRequest error: &error];
NSLog(@"Getting images for: %d products", numberOfProducts);

//Then set the batchsize to get chunks of data
NSUInteger batchSize = 25;
[productFetchRequest setFetchBatchSize: batchSize];
[productFetchRequest setFetchLimit:batchSize];

//Fetch the products in batches
for (NSUInteger offset = 0; offset < numberOfProducts; offset += batchSize) {
    @autoreleasepool {
        [productFetchRequest setFetchOffset: offset];
        NSArray * products = [self.backgroundMOC executeFetchRequest:productFetchRequest error:&error];
        NSLog(@"Offset: %d, number of products: %d", offset, [products count]);
        if (!products) {
            return NO;
        }

        for (Product * product in products) {
            NSLog(@"Downloading photo for product: %@", product.number);
            [self downLoadAndStoreImageForProduct:product];
        }
        [self saveAndResetBackgroundMOC];
    }
}

return YES;

}

The log shows that for the first half of the count (numberOfProducts), it works as expected. So chunks of 25 products are processed. After that first half, the fetchrequest in the loop has 0 records as a result. If I retry the same code again, again only half of the (remaining) records is processed, so 3/4 in total. What am I doing wrong? Note that the managedObjectContext is not only saved, but also reset after the save to save memory. If I do not do this in chunks, the program crashes consistantly after downloading about 3000 pictures.

回答1:

First point: maybe there is some basic misunderstanding about what fetchLimit and fetchBatchSize do.

fetchLimit and fetchOffset determine which and how many records are fetched.

fetchBatchSize indicates how many records should be retrieved during one trip to the persistent store. Thus if (with or without fetchBatchSize) the number of records that would be retrieved is 100, a fetchBatchSize of 25 would result in 4 trips to the store. (In other words, 4 executed SQL statements for the typical SQLite store. However, this all happens behind the scenes.)

Thus, the code snippet

request.fetchLimit      = x; 
request.fetchBatchSize  = x;

is redundant. The number of trips to the store will always be one anyway.

Second point: I am not sure your setup with the second MOC makes a lot of sense. I suppose you are in a background thread already. As far as I know resetting the MOC is quite expensive. It is not really necessary if you disable the undo manager of the MOC. As for the looping, I believe you can just fetch all records and let fetchBatchSize take care of the discrete "chunking". Because of Core Data's faulting behavior, your @autoreleasepool in the loop maybe will bring only limited advantage.

Where the @autoreleaspool is useful is when you download the images. Perhaps it is enough to batch this part of the process.

That being said, you might not want to change something that is (sort of) working.

Third point: you calculate the number of records based on an unknown (to us) predicate string. Is it dynamic? Not sure if this might not also be part of the issue. After all, not knowing what it is, it is surprising that the number of records changes.

Finally: check if you can do without resetting your MOC.



回答2:

The problem is in the predicate. It fetches all products without an image. If I download images, the result set for the predicate changes on a subsequent fetch and gets smaller every time. The solution is to process the result set in reverse order. So change:

for (NSUInteger offset = 0; offset < numberOfProducts; offset += batchSize)

Into:

for (NSInteger offset = MAX(numberOfProducts - batchSize, 0); offset > 0; offset -= batchSize)