I'm interested in hearing your opinions in which is the best way of implementing a social activity stream (Facebook is the most famous example). Problems/challenges involved are:
- Different types of activities (posting, commenting ..)
- Different types of objects (post, comment, photo ..)
- 1-n users involved in different roles ("User x replied to User y's comment on User's Z post")
- Different views of the same activity item ("you commented .." vs. "your friend x commented" vs. "user x commented .." => 3 representations of a "comment" activity)
.. and some more, especially if you take it to a high level of sophistication, as Facebook does, for example, combining several activity items into one ("users x, y and z commented on that photo"
Any thoughts or pointers on patterns, papers, etc on the most flexible, efficient and powerful approaches to implementing such a system, data model, etc. would be appreciated.
Although most of the issues are platform-agnostic, chances are I end up implementing such a system on Ruby on Rails
If you are willing to use a separate software I suggest the Graphity server which exactly solves the problem for activity streams (building on top of neo4j graph data base).
The algorithms have been implemented as a standalone REST server so that you can host your own server to deliver activity streams: http://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/
In the paper and benchmark I showed that retrieving news streams depends only linear on the amount of items you want to retrieve without any redundancy you would get from denormalizing the data:
http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/
On the above link you find screencasts and a benchmark of this approach (showing that graphity is able to retrieve more than 10k streams per second).
When the event is created, decide which feeds it appears in and add those to events_feeds. To get a feed, select from events_feeds, join in events, order by timestamp. Filtering and aggregation can then be done on the results of that query. With this model, you can change the event properties after creation with no extra work.
We've open sourced our approach: https://github.com/tschellenbach/Stream-Framework It's currently the largest open source library aimed at solving this problem.
The same team which built Stream Framework also offers a hosted API, which handles the complexity for you. Have a look at getstream.io There are clients available for Node, Python, Rails and PHP.
In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html
This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.
To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:
Though Stream Framework is Python based it wouldn't be too hard to use from a Ruby app. You could simply run it as a service and stick a small http API in front of it. We are considering adding an API to access Feedly from other languages. At the moment you'll have to role your own though.
There are two railscasts about such an activity stream:
Those solutions dont include all your requirements, but it should give you some ideas.
The biggest issues with event streams are visibility and performance; you need to restrict the events displayed to be only the interesting ones for that particular user, and you need to keep the amount of time it takes to sort through and identify those events manageable. I've built a smallish social network; I found that at small scales, keeping an "events" table in a database works, but that it gets to be a performance problem under moderate load.
With a larger stream of messages and users, it's probably best to go with a messaging system, where events are sent as messages to individual profiles. This means that you can't easily subscribe to people's event streams and see previous events very easily, but you are simply rendering a small group of messages when you need to render the stream for a particular user.
I believe this was Twitter's original design flaw- I remember reading that they were hitting the database to pull in and filter their events. This had everything to do with architecture and nothing to do with Rails, which (unfortunately) gave birth to the "ruby doesn't scale" meme. I recently saw a presentation where the developer used Amazon's Simple Queue Service as their messaging backend for a twitter-like application that would have far higher scaling capabilities- it may be worth looking into SQS as part of your system, if your loads are high enough.
I think Plurk's approach is interesting: they supply your entire timeline in a format that looks a lot like Google Finance's stock charts.
It may be worth looking at Ning to see how a social networking network works. The developer pages look especially helpful.