I'm working in a case require fairy complex use of page views.
Each content object will have page view, and this should be accessed easily so that we can do various things related to it (sorted on catalog result, display, calculate popular metter ...). The closest equivalent is the youtube video's view.
I'm thinking of some possible ways to implemented this:
Use annotation storage & indexer to create portal_catalog index & metadata.
Use indexer only (either use volatile attribute or update index base on the previous index) so that we don't have to write a frequently changed data twice. Page view is only store in object brain.
Use a relational database. Then how can we make it work with portal_catalog ?
Use a wrapper layer before Plone to do analytics and get desired data through some API. This scarify flexibility but helps reduce much work on Plone side (write event subscriber, check session, cookie ...), and the performance should be better ?
Your ideas/experience on this ?
We have used an external log analyser for a client project (a large private intranet). Architecture:
- A js library adds a 'web bug', an empty gif with additional query parameters, loaded from a dedicated nginx server.
- A log processor picks up the nginx logs, rotates them, and parses the lines into a database, counting access together with the additional metadata. The entries in the db include the UID of the content, among other interesting angles.
- The site has read-only access to the same database, to make stats queries.
Page counts are then easy, just query the database for the right UID. Ranked lists are not much harder; query the statistics, then use the UIDs to attach catalog data to the result set.
The biggest problem we face now is a lack of data warehousing know-how (turning individual access rows in the database into efficient aggregates), and we are looking into retooling this setup to use Piwik as the statistics engine instead.
We cannot use Google Analytics in this particular case, but if you do not have such a restriction, I'd certainly would advise you to look into collective.googleanalytics and see if you can make it fit your use case.
Did you already seen this product? :
http://plone.org/products/collective.googleanalytics/
It seems to fits your needs, or at least it could be a good base for your customizations.
A write on every access is a worst-case scenario for the ZODB. Relational DBs are generally pretty good at this sort of thing, and I'd look at that first.
Need to sort on the data? Just add some utility or content type methods to query the db. When you need to lookup, do the catalog search, then use the db-connecting methods to annotate the data for the sort.
We have done this long time ago (plone2.5). The customer wants this ! Once it was done it was finally not the real need. it was the prefered articles, it is not equal to the most viewed... so content rating was the one.
So first validate this with your customer.
Next the best way to achieve your need is to install an analytics tool, googleanalytics or anything else but with an API to ask the most viewed page. If you need this into the portal_catalog you can index the value when an article is viewed + only every hour.
I noticed that the folks from Nidelven just released http://plone.org/products/Products.ZODBFriendlyCounter, which promises to do this natively without excessive ZODB writes/bloat. Worth checking out, would love to hear more expert opinions on this.
If you use dexterity ,you should customize a pageview behavior to add annotoatin data to main object and the annotation date use volatile attribute.