I store user activity data: when user visited current article, topic or personal message to show him how many new comments and messages were added while he was offline.
class SiteActivity
include Mongoid::Document
include Mongoid::Timestamps
belongs_to :user
belons_to :activity, polymorphic: true
end
In this case I store one record per document.
Another option is to use embedded documents, so all user activities will be stored in one document:
class SiteActivity
include Mongoid::Document
belongs_to :user
embeds_many :user_activities
validates :user_id, uniqueness: true
end
class UserActivity
include Mongoid::Document
include Mongoid::Timestamps
embedded_in :site_activity
belongs_to :activity, polymorphic: true
end
So now I don't need to search through all SiteActivities (many many records) but I can fetch one user_activity
for current_user and find activity I need through it embedded documents.
Which way is more efficient to store and search data?
My ordinary use case is:
I have got a user and a post so I am fetching for site_activity with this data to see the date when this user visited post last time.
with my first option:
activity = SiteActivity.where(user_id: current_user.id, activity_id: post.id, activity_type: post.class)
with second
user_activity = SiteActivity.where(user_id: current_user.id)
activity = user_activity.user_activities.where(activity_id: post.id, activity_type: post.class)
It would be better to use the first approach(individual documents) and use a capped collection if possible, as you don't want to have rapidly growing collection(mongoid will have support for capped collections in 2.2, which would be out this weekend I guess).
The second approach(embedded documents), you will need to first fetch the root document for the user and then traverse the array in application to find the activity related with post you are looking for. Mongoid may make it look like everything is done in db due to similarity of syntax in finding an embedded document, but its really iterating the array.
As you already have the user_id, activity_id and activity_type before making a query, and you would not want the whole list of activities for the user to be retrieved from db when you are looking for a particular activity, I will prefer first case. There would be much less calculations(searching) in application and there will be much less network traffic.
With individual documents approach, it would be great if you also create a unique index on user_id, activity_id, activity_type. It will help you contain the number of documents. You can have the uniqueness validation(extra query), but that would be mostly unnecessary if you have the unique index. The only benefit of validation will be an validation error if there are duplicates, but index will ignore duplicate entries silently unless you persist in safe mode.
In case you also want the historical site activity to be persisted, you can have the structure like:
class SiteActivity
include Mongoid::Document
include Mongoid::Timestamps
belongs_to :user
belongs_to :activity, polymorphic: true
index [:user_id, :activity_id, :activity_type], :background => true, :unique => true
field :last_access_time, :type => Time
# last_access_times just here for history, not used
field :last_access_times, :type => Array, :default => []
end
activity = SiteActivity.find_or_initialize_by(:user_id => current_user.id,
:activity_id => post.id, :activity_type => post.class)
time = Time.now.utc
activity.last_access_time = time
activity.last_access_times << time
activity.save
It seems that similar topic was discuss yesterday.
Look at best possible schema design for log analysis database in mongodb Maybe it will help.