Trello shows a historial log of everything that any user has done since the board's inception. Likewise, if you click on a specific card it shows the history of anything anyone has done related to that card.
Keeping track of every change/addition/deletion that is kept indefinitely must collect a ton of data and also potentially bottleneck on writing to the history trail log (assuming it is written immediately to a data store of sorts). I mean, it isn't like they are storing everything in log files spread across 1000's of servers that they only collect and parse when they need to find something -- they are displaying all of this info all the time.
I know this isn't the only service that provides something like this, but how would you go about architecting such a system?
I'm on the Trello team. We use an Actions collection in our MongoDB instance, with a compound index on the ids of the models to which it refers (a Card is a model, and so is a Member) and the date when the action was performed. No fancy caching or anything, except inasmuch as the index and recently used documents are kept in memory by the DB. Actions is by far our biggest collection.
It is worth mentioning that most of the data needed to display an action is stored denormalized in the action document, so that speeds things up considerably.
The easiest way that comes to mind is to have a table like:
create table HistoryItems (
ID INT PK,
UserID INT PK,
DateTime datetime,
Data varbinary(max)/varchar(max)/...)
Indexing this on UserID allows for fast retrieval. A covering index would enable fetching the history of an entire user in one disk seek no matter how long it is.
This table could be clustered on (UserID asc, DateTime desc, ID) so you don't even have to have any index at all and still have optimal performance.
Any easy problem for a relational database.
I have something very similar as @Brett from Trello answered above in my PHP + MySQL app which I use for tracking user activity in our order and production management app for our online web store.
I have table activities which holds:
user_id
: user that performed action
action_id
: the action that was performed (e.g. create, update, delete, and so on...)
resource
: the ENUM list of resources (models) that action was performed on (e.g. orders, invoices, products, etc...)
resource_id
: PK of the resource that action was performed on
description
: text description of the action (can be null)
It's a large table indeed, but with right indexes it handles very well. It acts it's purpose. Is simple and fast. Currently it holds 200k records and growing with cca. 1000 new entries per day.