I need to build a analytics server for large scale (seven figures and up) quickly and for the cheap.
Piwik would be the easy choice but for what I've gathered so far, Piwik is rather hard to scale and can require rather hefty servers to handle loads.
My second idea would to create quick and dirty Node.js server which just pushes everything to Amazon DynamoDB, so that one can start gathering the data from the day one and then build the UI later on. That would be quick to create and scale (vertically and horizontally). However, I'm wondering if DynamoDB is the right choice for such use? (gather data, generate reports)
Piwik scales up to millions of pages & dozens of thousands of tracked websites per month. See their docs: http://piwik.org/docs/optimize/ and: http://piwik.org/blog/2012/07/piwik-high-scale-performance-report-as-of-july-2012/
I'm using DynamoDB professionaly and would not use it for your application.
DynamoDB truly has tons of constraints. Among them, you can have only one
hash_key
and optionally, onerange_key
.You may do some "analytics" for items grouped under a given
hash_key
usingquery
but really nothing fancy. For complex queries, you would have to usescan
or EMR which are slow and expensive and have a couple of drawbacks due to throttling.Nonetheless, NoSQL seems a good choice, at least for the prototyping stage of your application. But, I would recommend MongoDB instead. You can index any column, do complex queries, do not worry about data throttling. Sharding and replications is not too hard to setup.
MongoDB has a strong ecosystem and community which DynamoDB has not (yet) as it is much younger. MongoDB also has hosted offers which would allow you to bootstrap your application as quickly as you would with DynamoDB.