I'm interested in hearing your opinions in which is the best way of implementing a social activity stream (Facebook is the most famous example). Problems/challenges involved are:
- Different types of activities (posting, commenting ..)
- Different types of objects (post, comment, photo ..)
- 1-n users involved in different roles ("User x replied to User y's comment on User's Z post")
- Different views of the same activity item ("you commented .." vs. "your friend x commented" vs. "user x commented .." => 3 representations of a "comment" activity)
.. and some more, especially if you take it to a high level of sophistication, as Facebook does, for example, combining several activity items into one ("users x, y and z commented on that photo"
Any thoughts or pointers on patterns, papers, etc on the most flexible, efficient and powerful approaches to implementing such a system, data model, etc. would be appreciated.
Although most of the issues are platform-agnostic, chances are I end up implementing such a system on Ruby on Rails
I solved this a few months ago, but I think my implementation is too basic.
I created the following models:
Example
I have created such system and I took this approach:
Database table with the following columns: id, userId, type, data, time.
This limits the searches/lookups, you can do in the feeds, to users, time and activity types, but in a facebook-type activity feed, this isn't really limiting. And with correct indices on the table the lookups are fast.
With this design you would have to decide what metadata each type of event should require. For example a feed activity for a new photo could look something like this:
You can see that, although the name of the photo most certainly is stored in some other table containing the photos, and I could retrieve the name from there, I will duplicate the name in the metadata field, because you don't want to do any joins on other database tables if you want speed. And in order to display, say 200, different events from 50 different users, you need speed.
Then I have classes that extends a basic FeedActivity class for rendering the different types of activity entries. Grouping of events would be built in the rendering code as well, to keep away complexity from the database.
This is a very good presentation outlining how Etsy.com architected their activity streams. It's the best example I've found on the topic, though it's not rails specific.
http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
If you do decide that you're going to implement in Rails, perhaps you will find the following plugin useful:
ActivityStreams: http://github.com/face/activity_streams/tree/master
If nothing else, you'll get to look at an implementation, both in terms of the data model, as well as the API provided for pushing and pulling activities.
I had a similar approach to that of heyman - a denormalized table containing all of the data that would be displayed in a given activity stream. It works fine for a small site with limited activity.
As mentioned above, it is likely to face scalability issues as the site grows. Personally, I am not worried about the scaling issues right now. I'll worry about that at a later time.
Facebook has obviously done a great job of scaling so I would recommend that you read their engineering blog, as it has a ton of great content -> http://www.facebook.com/notes.php?id=9445547199
I have been looking into better solutions than the denormalized table I mentioned above. Another way I have found of accomplishing this is to condense all the content that would be in a given activity stream into a single row. It could be stored in XML, JSON, or some serialized format that could be read by your application. The update process would be simple too. Upon activity, place the new activity into a queue (perhaps using Amazon SQS or something else) and then continually poll the queue for the next item. Grab that item, parse it, and place its contents in the appropriate feed object stored in the database.
The good thing about this method is that you only need to read a single database table whenever that particular feed is requested, rather than grabbing a series of tables. Also, it allows you to maintain a finite list of activities as you may pop off the oldest activity item whenever you update the list.
Hope this helps! :)
I started to implement a system like this yesterday, here's where I've got to...
I created a StreamEvent class with the properties Id, ActorId, TypeId, Date, ObjectId and a hashtable of additional Details key/value pairs. This is represented in the database by a StreamEvent table (Id, ActorId, TypeId, Date, ObjectId) and a StreamEventDetails table (StreamEventId, DetailKey, DetailValue).
The ActorId, TypeId and ObjectId allow for a Subject-Verb-Object event to be captured (and later queried). Each action may result in several StreamEvent instances being created.
I've then created a sub-class for of StreamEvent each type of event, e.g. LoginEvent, PictureCommentEvent. Each of these subclasses has more context specific properties such as PictureId, ThumbNail, CommenText, etc (whatever is required for the event) which are actually stored as key/value pairs in the hashtable/StreamEventDetail table.
When pulling these events back from the database I use a factory method (based on the TypeId) to create the correct StreamEvent class.
Each subclass of StreamEvent has a Render(context As StreamContext) method which outputs the event to screen based on the passed StreamContext class. The StreamContext class allows options to be set based on the context of the view. If you look at Facebook for example your news feed on the homepage lists the fullnames (and links to their profile) of everyone involved in each action, whereas looking a friend's feed you only see their first name (but the full names of other actors).
I haven't implemented a aggregate feed (Facebook home) yet but I imagine I'll create a AggregateFeed table which has the fields UserId, StreamEventId which is populated based on some kind of 'Hmmm, you might find this interesting' algorithm.
Any comments would be massively appreciated.