I am interested in building a streaming API (read-only) similar to what Twitter has built. Data will only be going unidirectional, from server to client. Clients do not have to be web browsers but merely anything that can technically keep a persistent HTTP connection open. I'm fairly certain what Twitter's streaming API is doing is not WebSockets and not COMET. I was wondering if the technology/strategy that they deployed is one with a w3c specification that one can study. I don't necessarily see any links to their strategy on W3C - so it might be something "custom" but any point in the right direction to understanding the buzzwords and protocols involved to building this server side HTTP streaming support would be great.
问题:
回答1:
Twitter's implementation uses a custom protocol, but it's similar in spirit to the w3c-standard Server-Sent Events. Server-sent events are much simpler than websockets, but only allow communication in one direction. There is a python implementation of the server side of the protocol in this pull request for Tornado.
回答2:
Based on this slide, twitter streaming API uses Jetty server. So does a plain blocking IO work? Basically client makes a request telling what tweets it interests, the server responds but doesn't close the response. Every time there are new tweets coming in, the server gets notified and writes(and flushes) the data back to client, but again not close the response.
From notes of Page 20:
How do the servers work internally? Hosebird runs on the JVM. It's written in Scala. And uses an embedded Jetty webserver to handle the front end issues. We feed each process 8 cores and about 12 gigs of memory. And they each can send a lot of data to many many of clients.
Disclaimer: I am not familiar with this topic so I might be totally wrong. What I said is based on my feeling. It's an interesting topic.
回答3:
You may be looking for a publish / subscribe service. Some good information on this is http://en.wikipedia.org/wiki/Publish/subscribe. You can make the service read only and discard messages from the client that don't connect to valid channels.
An implementation could be done with redis http://redis.io/topics/pubsub and a small application to connect to the proper channels.
Other implementations could be done with RabbitMQ http://www.rabbitmq.com/tutorials/tutorial-three-python.html I am sure there are other implementations but I am not privy to them at this time.
Here is a w3c link to pub/sub http://www.w3.org/community/pubsub/