Efficient way to store data in MongoDB: embedded d

2019-05-23 09:54发布

问题:

I store user activity data: when user visited current article, topic or personal message to show him how many new comments and messages were added while he was offline.

class SiteActivity
  include Mongoid::Document
  include Mongoid::Timestamps
  belongs_to :user
  belons_to :activity, polymorphic: true
end

In this case I store one record per document.

Another option is to use embedded documents, so all user activities will be stored in one document:

class SiteActivity
  include Mongoid::Document
  belongs_to :user
  embeds_many :user_activities
  validates :user_id, uniqueness: true
end

class UserActivity
  include Mongoid::Document
  include Mongoid::Timestamps
  embedded_in :site_activity
  belongs_to :activity, polymorphic: true
end

So now I don't need to search through all SiteActivities (many many records) but I can fetch one user_activity for current_user and find activity I need through it embedded documents.

Which way is more efficient to store and search data?

My ordinary use case is:

I have got a user and a post so I am fetching for site_activity with this data to see the date when this user visited post last time.

with my first option:

activity = SiteActivity.where(user_id: current_user.id, activity_id: post.id, activity_type: post.class)

with second

user_activity = SiteActivity.where(user_id: current_user.id)
activity = user_activity.user_activities.where(activity_id: post.id, activity_type: post.class)

回答1:

It would be better to use the first approach(individual documents) and use a capped collection if possible, as you don't want to have rapidly growing collection(mongoid will have support for capped collections in 2.2, which would be out this weekend I guess).

The second approach(embedded documents), you will need to first fetch the root document for the user and then traverse the array in application to find the activity related with post you are looking for. Mongoid may make it look like everything is done in db due to similarity of syntax in finding an embedded document, but its really iterating the array.

As you already have the user_id, activity_id and activity_type before making a query, and you would not want the whole list of activities for the user to be retrieved from db when you are looking for a particular activity, I will prefer first case. There would be much less calculations(searching) in application and there will be much less network traffic.

With individual documents approach, it would be great if you also create a unique index on user_id, activity_id, activity_type. It will help you contain the number of documents. You can have the uniqueness validation(extra query), but that would be mostly unnecessary if you have the unique index. The only benefit of validation will be an validation error if there are duplicates, but index will ignore duplicate entries silently unless you persist in safe mode.

In case you also want the historical site activity to be persisted, you can have the structure like:

class SiteActivity
  include Mongoid::Document
  include Mongoid::Timestamps
  belongs_to :user
  belongs_to :activity, polymorphic: true

  index [:user_id, :activity_id, :activity_type], :background => true, :unique => true

  field :last_access_time, :type => Time
  # last_access_times just here for history, not used
  field :last_access_times, :type => Array, :default => []
end

activity = SiteActivity.find_or_initialize_by(:user_id => current_user.id,
               :activity_id => post.id, :activity_type => post.class)
time = Time.now.utc
activity.last_access_time = time
activity.last_access_times << time
activity.save


回答2:

It seems that similar topic was discuss yesterday. Look at best possible schema design for log analysis database in mongodb Maybe it will help.