I'm looking for an embeddable Java library that is suitable for collecting real-time streams of sensor data in a general-purpose way. I plan to use this to develop a "hub" application for reporting on multiple disparate sensor streams, running on a JVM based server (will also be using Clojure for this).
Key things it needs to have:
- Interfaces for various common sensor types / APIs. I'm happy to build what I need myself, but it would be nice if some standard stuff comes out of the box.
- Suitable for "soft real time" usage, i.e. fairly low latency and low overhead.
- Ability to monitor and manage streams at runtime, gather statistics etc.
- Open source under a reasonably permissive license so that I can integrate it with other code (Apache, EPL, BSD, LGPL all fine)
- A reasonably active community / developer ecosystem
Is there something out there that fits this profile that you can recommend?
1. Round-robin database (wikipedia)
RRDtool (acronym for round-robin database tool) aims to handle
time-series data like network bandwidth, temperatures, CPU load, etc.
The data are stored in a round-robin database (circular buffer), thus
the system storage footprint remains constant over time.
This approach/DB format is widely used, stable and simple enough. Out of the box it allows to generate nice plots:
There is Java implementation -- RRD4J:
RRD4J is a high performance data logging and graphing system for time
series data, implementing RRDTool's functionality in Java. It follows
much of the same logic and uses the same data sources, archive types
and definitions as RRDTool does. Open Source under Apache 2.0 License.
Update
Forget to mention there is Clojure RRD API (examples).
2. For some experiments with real-time data I would suggest to consider Perst
It is small, fast and reliable enough, but distributed under GPLv3. Perst provides several indexing algorithms:
- B-Tree
- T-Tree (optimized for in-memory database)
- R-Tree (spatial index)
- Patricia Trie (prefix search)
- KD-Tree (multidimensional index)
- Time series (large number of fixed size objects with timestamp)
The last one suits your needs very well.
3. Neo4J with Relationship indexes
A good example where this approach pays dividends is in time series
data, where we have readings represented as a relationship per
occurrence.
4. Oracle Berkeley DB Java Edition
Oracle Berkeley DB Java Edition is an open source, embeddable,
transactional storage engine written entirely in Java. It takes full
advantage of the Java environment to simplify development and
deployment. The architecture of Oracle Berkeley DB Java Edition
supports very high performance and concurrency for both read-intensive
and write-intensive workloads.
Suggestion
Give a try to RRD4J:
- It is simple enough
- It dose provide quite a nice plots
- It has Clojure API
- It supports several back-ends including Oracle Berkeley DB Java Edition
- It can store/visualize detailed data sets
For collecting real-time streams of sensor data following might be of help
Have you checked LeJos API's. This http://lejos.sourceforge.net/nxt/nxj/api/index.html
Also it is worth checking Oracle Java ME Embedded and the target markets they are addressing http://www.unitask.com/oracledaily/2012/10/04/at-the-java-demogrounds-oracle-java-me-embedded-enables-the-internet-of-things/
Can be downloaded from http://www.oracle.com/technetwork/java/embedded/downloads/javame/index.html
For storing the Time series data nothing beats cassandra http://cassandra.apache.org/ and to answer why cassandra refer http://www.datastax.com/why-cassandra
For accessing Cassandra from Java refer https://github.com/jmctee/Cassandra-Client-Tutorial
It is quite helpful and applying the time series concept in cassandra db refer
http://www.datastax.com/wp-content/uploads/2012/08/C2012-ColumnsandEnoughTime-JohnAkred.pdf