So basically I want to implement the same functionality as stackoverflow's:
viewed 59344 times
So here is some background information:
- I want to count only unique visits. Assumption that registered users will read article many times (it is evolving)
- I use MongoDB as a store
- I would like it to be close to real time
- My system will have a registration, but I want to count the views of anonymous users as well
I understand that the best way to count unique visits is through registration, but the thing is that a big chunk of users will be just passive readers who do not need to create an account to read the information from the application. As far as I understand, the most convenient way is to save IP address of every user, who read the post. I also understand that IP address will not provide uniqueness (some different users will have the same IP, because they are behind the same ISP and one user can have different IPs, by using proxies, tor etc)
The use of Mongo is not absolutely essential, just the thing is that everything is written in Mongo right now, so I will switch only if it will be much faster/convenient.
Background
Are you certain you need to track "unique" views?
I actually wouldn't expect popular sites to try to keep the view counts unique - bigger is better and re-visits for new comments are still additional "views" in the the sense of showing new content/comments/ads. There are other possible subtleties to "correctness" that may or may not be important for your use case, such as excluding crawlers or your own company's users/IPs.
Instead of spending time tracking unique views (which isn't overly meaningful), I would look at counting unique user interactions such as voting/liking/commenting on the page. You can then determine "popularity" of a page with some formula based on those metrics. There is an interesting example of this approach in the Radioactivity module for Drupal, where a "hotness" metric is calculated based on activity based on recency of user interactions.
Approaches to consider
1) For a simple view counter in MongoDB, I would just use
$inc
to bump up the view count when the page is loaded. You can exclude logging users by role as needed (for example admin users).2) For a more accurate view counter I would pass off the problem to a web analytics platform (which you should be using with your site for more detailed analysis anyway). For example, you can use Google Analytics API or an open source application like Piwik. Web analytics systems already have solutions in place for determining unique users/views, and the API calls for these can be asynchronous via JavaScript.
3) If implementing your own unique view tracking a definite requirement, I would use a separate collection for tracking views and upsert based on your uniqueness criteria (unique view per
user,article
pair for registered users orsession_id,article
pair for anon users). I would combine this with approach #1 (incrementing a view counter for the article views) by incrementing a counter of article views if the upsert results in an insert.One of the way that you can solve the problem is using the cookies , once a user has visited the page , you can have one cookie added saying that he is already visited the page and you do not need to count him again. You can keep on appending some key to know what all pages he had visited. I know cookies can be deleted but in any solution there will be tradeoff.
From the mongoDB prospective , if you want very fast insert and read , i would suggest couple of things you can do.
1) As you create a article , create a document like this in your may be log collection
Why i am not suggesting to add IP address or any other information because , as you will add IP addresses , the size of the document going to change mongoDB need to find new allocated space. Which is bad from performance angle. As you are only incrementing the counter it will not increase the size of the document and it will no need to change it place. + You have limitation on the maximum size of the document you can have.
2) Creating document in advance will give direct update statement and no worry to check for the existence of the document for the article Id or not.