How to speed up updating relationship among tables

2019-01-28 00:47发布

问题:

Question: Update and save fast, relationship between tables with lots of data after both or one of the table is already saved.

I have five tables TvGenres, TvSubgenre, TvProgram, Channels, TvSchedules with the relationship between them as shown in below image

Now the problem is all data downloading happens in sequence based on previous data and unlike SQLite, I need to set relationship between them and to do that I have to search table again and again and set the relation between them which is time-consuming so how can I do that faster

I use 2 different approaches to solve but both are not working as expected

First let me tell, how downloading is working

First I fetch all the channels details based on user languages From channels, I fetch all the schedules for next one week (that's a lot of data (around 30k+ )) And from schedules data, I fetch all the programs data (that's again a lot of data )

Approach 1,

Download all data and create object list of them and then store them at once after all downloading is done but still setting relationship among them takes time and worst thing now the loop happens twice as first I have to loop to create all the class list and then loop again to store those in table view and still don’t solve the relationship time-consuming issue.

Approach 2

Download one by one like download channels store them and then download schedules store them and then download programs and then store them in core data this is all ok but now channels have relationship with schedule and schedules have relationship with programs and to set the relation while I am storing schedules I also fetch channel related to that schedule and then set the relationship, same for program and schedules and that's taking time below is the code so how can I fix this problem or how should I download and store so it becomes as fast as possible.

Code for only storing schedules

func saveScheduleDataToCoreData(withScheduleList scheduleList: [[String : Any]], completionBlock: @escaping (_ programIds: [String]?) -> Void) {
    let start = DispatchTime.now()
    let context = coreDataStack.managedObjectContext

    var progIds = [String]()
    context.performAndWait {
        var scheduleTable: TvSchedule!

        for (index,response) in scheduleList.enumerated() {
            let schedule: TvScheduleInformation = TvScheduleInformation(json: response )
            scheduleTable = TvSchedule(context: context)
            scheduleTable.channelId = schedule.channelId
            scheduleTable.programId = schedule.programId
            scheduleTable.startTime = schedule.startTime
            scheduleTable.endTime = schedule.endTime
            scheduleTable.day = schedule.day
            scheduleTable.languageId = schedule.languageId
            scheduleTable.isReminderSet = false

            //if I comment out the below code then it reduce the time significantly from 5 min to 34.74 s
            let tvChannelRequest: NSFetchRequest<Channels> = Channels.fetchRequest()
            tvChannelRequest.predicate = NSPredicate(format: "channelId == %d", schedule.channelId)
            tvChannelRequest.fetchLimit = 1
            do {
                let channelResult = try context.fetch(tvChannelRequest)
                if channelResult.count == 1 {
                    let channelTable = channelResult[0]
                    scheduleTable.channel = channelTable
                }
            }
            catch {
                print("Error: \(error)")
            }
            progIds.append(String(schedule.programId))
            //storeing after 1000 schedules 
            if index % 1000 == 0 {
                print(index)
                do {
                    try context.save()
                } catch let error as NSError {
                    print("Error saving schdeules object context! \(error)")
                }

            }
        }
    }
    let end = DispatchTime.now()
    let nanoTime = end.uptimeNanoseconds - start.uptimeNanoseconds
    print("Saving \(scheduleList.count) Schedules takes \(nanoTime) nano time")
    coreDataStack.saveContext()
    completionBlock(progIds)
}

Also how to do proper batch save using autoreleas pool

PS: All the material I found related to core data are expensive costing more than 3k, and with free, there isn't much information just basic stuff even apple docs don't have much code related to performance tuning and batch updates and handing relationship. Thanks in advance for anyknid of help.

回答1:

I've had projects like this before. There isn't a single solution that solves everything, but these are some things that help a lot:

Queues and Batching

It seems like you attempted to insert it all at once, and then tried doing it one by one. In my apps I found around 300 to be best batch size, but you have to tweak it to see what works in your application, it could be as much as 5000 or at little as 100. Start with 300 and tweak to see what gets better results.

You have a few processes going on, you mentioned downloading and saving to the database, but I wouldn't be surprised if there are more that you haven't mentioned. Queues (NSOperationsQueue) are an amazing tool for this. You might think that making a queue will slow things down, but it is not true. When you try to do too much at once things get slow.

So you have one queue that is downloading the information (I suggest limiting to 4 concurrently requests), and one that is saving the data to core data (limit concurrency to 1 to not have write conflicts). As each download task finishes, it chucks the results into more manageable size and queues to be written to the database. Don't worry if the last batch is a little smaller than the rest.

Each insert into core data creates it own context, does it own fetches, saves it and then discards the objects. Don't access these objects from anywhere else of you will get crashes - core data is not thread safe. Also only write to core data using this queue or you will get write conflicts. (see NSPersistentContainer concurrency for saving to core data for more information about this setup).

Lookup Maps

Now you are trying to insert 300ish entities and each have to find or create related entities. You might have a few function that are scattered around that accomplish this. If you program this without considering performance you will easily do 300 or even 600 fetch requests. Instead you do a single fetch fetchRequest.predicate = NSPredicate(format: "channelId IN %@", objectIdsIamDealingWithNow). After you fetch convert the array to a dictionary with the id as the key

  var lookup:[String: TvSchedule] = [:]
  if let results = try? context.fetch(fetchRequest) {
      results.forEach { if let channelId = $0.channelId { lookup[channelId] = $0  } }
  }

Once you have this lookup map do not lose it. Pass it to every function that needs it. If you create objects then consider inserting them into the dictionary afterwards. Inside the core data operation this lookup dictionary is your best friend. Be careful though. This object contains managedObjects which are not thread safe. You create this object at the beginning of your database block and must discard it at the end.

Prefer filtering relationships over fetches

You don't have any code that explicitly deals with this, but I wouldn't be surprise if you run into it. Lets say you have a particular TvSchedule and you want to find all of the Programs that are in the schedule in a particular language. The natural way to do this would be to create a predicate that looks something like: "TvSchedule == %@ AND langId == %@". But it is actually much faster to do mySchedule.programs.filter {%@.langId = myLangId }

Analize and tweak

I see you are already adding logs to the code to see how long stuff takes, that is really good. I would also recommend using the Profile tools of xCode. This can be really good for finding the functions that are taking up most of the time.