I have an array called records
with thousand of hashes (see the first array showed below). Every hash contains currently two fields id
and parent_id
. I want to add a new field called updated_at
which is stored in the database (see the second array below).
records = [{"id"=>3, "parent_id"=>2},
{"id"=>4, "parent_id"=>2}]
records = [{"id"=>3, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:35 UTC"},
{"id"=>4, "parent_id"=>2, "updated_at"=>"2014-03-21 20:44:34 UTC"}]
My first approach is the following one, but it executes a query to the database for every hash, so if I have 1K hashes in the array, it is going to execute 1K queries, which I think is not very good from the performance point of view.
records.each do |record|
record['updated_at'] = Record.find(record['id']).updated_at.utc.to_s
end
Can you suggest me a better solution?
How about something like this? Bulk up the queries by aggregating the ids a slice at a time. Adjust each_slice
amount to something that performs well...
records.each_slice(250) do |records|
ids = records.map { |r| r['id'] }
results = Record.select([:id, :updated_at]).find(ids)
records.each do |rec|
result = results.find { |res| res.id == rec.id }
rec['updated_at'] = result.updated_at.utc.to_s
end
end
How about this?
plucked_records = Record.pluck(:id, :updated_at).find(records.map { |a| a.fetch("id") })
records.map! do |record|
plucked_records.each do |plucked_record|
record["updated_at"] = plucked_record.last.utc.to_s if plucked_record.first == record["id"]
end
record
end
May be someone can improvise it better. :)
After doing a lot of benchmarks and trying different algorithms I have come up with a solution
that performs very fast and seems it is the most efficient one for now.
The idea is to convert the resulted array of db records into an hash, so
finding items into the hash is much faster than doing it into an array.
The time of the results came from benchmarks ran using an array of about 4.5K hashes.
# My last approach
# Converting the returning records Array into a Hash (thus faster searchs)
# Benchmarks average results: 0.5 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
hash_records = Hash[db_records.map { |r| [r.id, r.updated_at.utc.to_s] }]
records.each do |rec|
rec["updated_at"] = hash_records[rec["id"]]
end
# Original approach
# Doing a SQL query for each pair (4.5K queries against MySQL)
# Benchmarks average results: ~10 seconds
records.each do |rec|
db_rec = Record.find(pair['id'])
rec["updated_at"] = db_rec.updated_at.utc.to_s
end
# Kirti's approach (slightly improved). Thanks Kirti!
# Unfortunaly searching into a lar
# Doing a single SQL query for all the pairs (then find in the array)
# Benchmarks average results: ~18 seconds
ids = records.map { |rec| rec['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
records.each do |rec|
db_rec = db_records.find { |f| f.id == pair["id"] }
rec["updated_at"] = db_rec.updated_at.utc.to_s
end
# Nick's approach. Thanks Nick! very good solution.
# Mixed solution levering in SQL and Ruby using each_slice.
# Very interesting results:
# [slice, seconds]:
# 5000, 18.0
# 1000, 4.3
# 500, 2.6
# 250, 1.5
# 100, 1.0
# 50, 0.9 <- :)
# 25, 1.0
# 10, 1.8
# 5, 2.3
# 1, 10.0
# Optimal slice value is 50 elements! (for this scenario)
# An scenario with a much costly SQL query might require a higher slice number
slice = 50
records.each_slice(slice) do |recs|
ids = recs.map { |pair| pair['id'] }
db_records = Record.select([:id, :updated_at]).find(ids)
recs.each do |rec|
db_rec = db_records.find { |f| f.id == rec["id"] }
rec["updated_at"] = db_rec.updated_at.utc.to_s
end
end