My name is Daniel, I'm a newcomer accountwise but a long time lurker. I decided to learn Apache Cassandra for my next "lets write some code while the kids are sleeping" project.
What i'm writing is a neat little api that will do read and writes against a cassandra database. I had a lot of the db layout figured out in mongodb, but for me it's time to move on and grow as a engineer :)
Mission: I will collect metrics from the servers in my rack, an agent will send a payload of metrics every minute. I have the api part pretty much figured out, will use JWT tokens signing the payloads. The type of data i will store can be seen below. cpuload, cpuusage, memusage, diskusage etc.
The part where i am confused with cassandra is how to write the actual model, i understand the storagengines sort of writes it all as a time serie on disk for me making reads quite amazing. i know anything i would whip together now would work for my lab since it's jsut 30 machines, but i'm trying to understand how these things are done properly and how it could be done for a real life scenario like server density, datadog , "insert your prefered server monitoring service". :)
But how are you more experienced engineers designing a schema like this ?
Usage scenarios for the database:
- write payloads every minute through the api. (lets imagine thats atleast 100k writes per minute for the sake of learning something useful)
Read the assets associated with ones userid
- pull most recent data (3h)
- pull most recent data (daily)
- pull most recent data (weekly)
- pull most recent data (monthly)
- etc etc
Generate monthly pdf reports showing uptime and such.
Should i insert the rows containing the full payload or am i better of inserting them per service basis: timeuid|cpuusage
Per service row
CREATE TABLE metrics(
id uuid PRIMARY KEY,
assetid int,
serviceType text,
metricValue int
)
All in one
CREATE TABLE metrics(
id uuid PRIMARY KEY,
assetid int,
cpuload int,
cpuusage int,
memusage int,
diskusage int,
)
In mongo i would preallocate the buckets, and also keep a quick read avg inside of the document. So in the webgui i could simply show the avg stats for pre-defined time periods.
Examples for dumbasses are highly appreciated. Hope you can decipher my rather poor english.
Just found this url in the SO suggestions: Cassandra data model for time series i guess that is something that applies to me aswell.
Sincerly Daniel Olsson