We are working on a project involving realtime data processing. We plan to use Django/Python. The actual process is:
- Tens of thousands of devices take 4 samples per seconds (0, 0.25, 0.5, 0.75) and continuously send back to our Django server, basically they are time series with timestamp and value
- We need to align samples from all devices according to the timestamp (need to have milliseconds precision) and do a simple average of all the time series
- All these needs to be done in realtime (maximum 1 second delay) and send away using another thread
We are looking into RRDTool and scikits.timeseries, but they don't have the precision of milliseconds, so they couldn't align our time series.
Just wondering is there any tools/data structure we can use with Django/Python for this type of realtime data processing. And thread safe is important, as sending the result away will be done in another thread.
Thanks in advance.
You may want to look into Tornado, a web server framework which is non-blocking and uses epoll. Should be more suitable for your real-time requirements than higher level frameworks like Django.
You should consider looking at the Celery Project. It plugs into Django just fine but not sure about whether it is sensitive to millisecond precision requirements. You may also consider getting off the django stack and using Brubeck with Mongrel2 and ZeroMQ.
Your options for real time web services in python are: Twisted, Tornado and Eventlet
You can integrate all this to work with Python/Django. Tutorial on that.
Short answer: No. Django won't help you with this.
Long answer: Sounds like a job for some custom code coming directly off the webserver. I'm thinking a python script hanging directly off wsgi, or even an apache module written in C!