It seems that mongodb has two equivalent methods:
#pluck and #distinct which both return only given fields from a collection.
so both
User.pluck(:name)
User.distinct(:name)
would return array of all names from User collection in db
> ['john', 'maria', 'tony', 'filip']
I don't mind duplicates. Which method is faster?
Let's run a benchmark!
require 'benchmark'
1_200.times { FactoryGirl.create(:user) }
Benchmark.bmbm(7) do |bm|
bm.report('pluck') do
User.pluck(:email)
end
bm.report('pluck.uniq') do
User.pluck(:email).uniq
end
bm.report('only.pluck') do
User.only(:email).pluck(:email)
end
bm.report('only.pluck.uniq') do
User.only(:email).pluck(:email).uniq
end
bm.report('distinct') do
User.distinct(:email)
end
bm.report('only.distnct') do
User.only(:email).distinct(:email)
end
end
which outputs:
Rehearsal ------------------------------------------------
pluck 0.010000 0.000000 0.010000 ( 0.009913)
pluck.uniq 0.010000 0.000000 0.010000 ( 0.012156)
only.pluck 0.000000 0.000000 0.000000 ( 0.008731)
distinct 0.000000 0.000000 0.000000 ( 0.004830)
only.distnct 0.000000 0.000000 0.000000 ( 0.005048)
--------------------------------------- total: 0.020000sec
user system total real
pluck 0.000000 0.000000 0.000000 ( 0.007904)
pluck.uniq 0.000000 0.000000 0.000000 ( 0.008440)
only.pluck 0.000000 0.000000 0.000000 ( 0.008243)
distinct 0.000000 0.000000 0.000000 ( 0.004604)
only.distnct 0.000000 0.000000 0.000000 ( 0.004510)
it clearly shows that using #distinct
is almost two times faster than #pluck