MongoDB Schema Design - Real-time Chat

2019-03-08 21:24发布

问题:

I'm starting a project which I think will be particularly suited to MongoDB due to the speed and scalability it affords.

The module I'm currently interested in is to do with real-time chat. If I was to do this in a traditional RDBMS I'd split it out into:

  • Channel (A channel has many users)
  • User (A user has one channel but many messages)
  • Message (A message has a user)

The the purpose of this use case, I'd like to assume that there will be typically 5 channels active at one time, each handling at most 5 messages per second.

Specific queries that need to be fast:

  • Fetch new messages (based on an bookmark, time stamp maybe, or an incrementing counter?)
  • Post a message to a channel
  • Verify that a user can post in a channel

Bearing in mind that the document limit with MongoDB is 4mb, how would you go about designing the schema? What would yours look like? Are there any gotchas I should watch out for?

回答1:

I used Redis, NGINX & PHP-FPM for my chat project. Not super elegant, but it does the trick. There are a few pieces to the puzzle.

  1. There is a very simple PHP script that receives client commands and puts them in one massive LIST. It also checks all room LISTs and the users private LIST to see if there are messages it must deliver. This is polled by a client written in jQuery & it's done every few seconds.

  2. There is a command line PHP script that operates server side in an infinite loop, 20 times per second, which checks this list and then processes these commands. The script handles who is in what room and permissions in the scripts memory, this info is not stored in Redis.

  3. Redis has a LIST for each room & a LIST for each user which operates as a private queue. It also has multiple counters for each room the user is in. If the users counter is less than the total messages in the room, then it gets the difference and sends it to the user.

I haven't been able to stress test this solution, but at least from my basic benchmarking it could probably handle many thousands of messages per second. There is also the opportunity to port this over to something like Node.js to increase performance. Redis is also maturing and has some interesting features like Pub/Subscribe commands, which might be of interest, that would possibly remove the polling on the server side possibly.

I looked into Comet based solutions, but many of them were complicated, poorly documented or would require me learning an entirely new language(e.g. Jetty->Java, APE->C),etc... Also delivery and going through proxies can sometimes be an issue with Comet. So that is why I've stuck with polling.

I imagine you could do something similar with MongoDB. A collection per room, a collection per user & then a collection which maintains counters. You'll still need to write a back-end daemon or script to handle manging where these messages go. You could also use MongoDB's "limited collections", which keeps the documents sorted & also automatically clears old messages out, but that could be complicated in maintaining proper counters.



回答2:

Why use mongo for a messaging system? No matter how fast the static store is (and mongo is very fast), whether mongo or db, to mimic a message queue your going to have to use some kind of polling, which is not very scalable or efficient. Granted you're not doing anything terribly intense, but why not just use the right tool for the right job? Use a messaging system like Rabbit or ActiveMQ.

If you must use mongo (maybe you just want to play around with it and this project is a good chance to do that?) I imagine you'll have a collection for users (where each user object has a list of the queues that user listens to). For messages, you could have a collection for each queue, but then you'd have to poll each queue you're interested in for messages. Better would be to have a single collection as a queue, as it's easy in mongo to do "in" queries on a single collection, so it'd be easy to do things like "get all messages newer than X in any queues where queue.name in list [a,b,c]".

You might also consider setting up your collection as a mongo capped collection, which just means that you tell mongo when you set up the collection that your collection should only hold X number of bytes, or X number of items. Adding additional items has First-In, First-Out behavior which is pretty much ideal for a message queue. But again, it's not really a messaging system.



回答3:

1) ape-project.org

2) http://code.google.com/p/redis/

3) after you're through all this - you can dumb data into mongodb for logging and store consistent data (users, channels) as well