Update an array of hashes with records from the da

2019-04-15 23:27发布

I have an array called records with thousand of hashes (see the first array showed below). Every hash contains currently two fields id and parent_id. I want to add a new field called updated_at which is stored in the database (see the second array below).

records = [{"id"=>3, "parent_id"=>2}, 
           {"id"=>4, "parent_id"=>2}]

records = [{"id"=>3, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:35 UTC"}, 
           {"id"=>4, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:34 UTC"}] 

My first approach is the following one, but it executes a query to the database for every hash, so if I have 1K hashes in the array, it is going to execute 1K queries, which I think is not very good from the performance point of view.

records.each do |record|
  record['updated_at'] = Record.find(record['id']).updated_at.utc.to_s
end

Can you suggest me a better solution?

3条回答
淡お忘
2楼-- · 2019-04-16 00:02

How about this?

plucked_records = Record.pluck(:id, :updated_at).find(records.map { |a| a.fetch("id") })

records.map! do |record|
  plucked_records.each do |plucked_record|
    record["updated_at"] = plucked_record.last.utc.to_s if plucked_record.first == record["id"]
  end
  record
end

May be someone can improvise it better. :)

查看更多
Lonely孤独者°
3楼-- · 2019-04-16 00:08

After doing a lot of benchmarks and trying different algorithms I have come up with a solution that performs very fast and seems it is the most efficient one for now.

The idea is to convert the resulted array of db records into an hash, so finding items into the hash is much faster than doing it into an array.

The time of the results came from benchmarks ran using an array of about 4.5K hashes.

# My last approach
# Converting the returning records Array into a Hash (thus faster searchs)
# Benchmarks average results: 0.5 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
hash_records = Hash[db_records.map { |r| [r.id, r.updated_at.utc.to_s] }]
records.each do |rec|
  rec["updated_at"] = hash_records[rec["id"]]
end

# Original approach
# Doing a SQL query for each pair (4.5K queries against MySQL)
# Benchmarks average results: ~10 seconds
records.each do |rec|
  db_rec = Record.find(pair['id'])
  rec["updated_at"] = db_rec.updated_at.utc.to_s
end

# Kirti's approach (slightly improved). Thanks Kirti! 
# Unfortunaly searching into a lar
# Doing a single SQL query for all the pairs (then find in the array)
# Benchmarks average results: ~18 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
records.each do |rec|
  db_rec = db_records.find { |f| f.id == pair["id"] }
  rec["updated_at"] = db_rec.updated_at.utc.to_s
end  

# Nick's approach. Thanks Nick! very good solution.
# Mixed solution levering in SQL and Ruby using each_slice.
# Very interesting results:
# [slice, seconds]:
# 5000, 18.0 
# 1000, 4.3
#  500, 2.6
#  250, 1.5
#  100, 1.0
#   50, 0.9 <- :)
#   25, 1.0
#   10, 1.8
#    5, 2.3
#    1, 10.0
# Optimal slice value is 50 elements! (for this scenario)
# An scenario with a much costly SQL query might require a higher slice number
slice = 50
records.each_slice(slice) do |recs|
  ids = recs.map { |pair| pair['id'] }
  db_records = Record.select([:id, :updated_at]).find(ids)
  recs.each do |rec|
    db_rec = db_records.find { |f| f.id == rec["id"] }
    rec["updated_at"] = db_rec.updated_at.utc.to_s
  end
end 
查看更多
Summer. ? 凉城
4楼-- · 2019-04-16 00:12

How about something like this? Bulk up the queries by aggregating the ids a slice at a time. Adjust each_slice amount to something that performs well...

records.each_slice(250) do |records|
  ids = records.map { |r| r['id'] }
  results = Record.select([:id, :updated_at]).find(ids)
  records.each do |rec|
    result = results.find { |res| res.id == rec.id }
    rec['updated_at'] = result.updated_at.utc.to_s
  end
end
查看更多
登录 后发表回答