I'm a real Erlang newbie (started 1 week ago), and I'm trying to learn this language by creating a small but efficient chat server. (When I say efficient I mean I have 5 servers used to stress test this with hundreds of thousands connected client - A million would be great !)
I have find some tutorials doing so, the only thing is, that every tutorial i found, are IRC like. If one user send a message, all user except sender will receive it.
I would like to change that a bit, and use one-to-one discussion.
What would be the most effective client pool for searching a connected user ?
I thought about registering the process, because it seems to do everything I need, but I really don't think this is the better way to do it. (Or most pretty way to do it anyway).
Does anyone would have any suggestions doing this ?
EDIT :
Every connected client is affected to an ID.
When the user is connected, it first send a login command to give it's id.
When an user wants to send a message to another one the message looks like this
[ID-NUMBER][Message] %% ID-NUMBER IS A FIXED LENGTH
When I ask for "the most effective client pool", I'm actually looking for the fastest way to retrieve/add/delete one client on the connected client list which could potentially be large (hundred of thousands -- maybe millions)
EDIT 2 :
For answering some questions :
- I'm using Raw Socket (Using telnet right now to communicate with server) - will probably move to ssl later...
- It is my own protocol
- Every Client is a spawned Pid
- Every Client's Pid is linked to it's own monitor (mostly for debugging reason - The client if disconnected should reconnect by it's own starting auth from scratch)
- I have read a couple a book before starting coding, So I do not master yet every aspect of Erlang but I'm not unaware of it, I will read more about it when needed I guess.
- What I'm really looking for is the best way to store and search thoses PIDs to send message directly from process to process.
Should I write my own search Client function using lists ?
or should I use ets ?
Or even use register/2 unregister/1 and whereis/1 to maintain my client list, using it's unique id as atom, it seems to be the simplest way to do so, I really don't know if it is efficient, but I'm pretty sure this is the ugly solution ;-) ?
I'm doing something similar to your chat program using gproc as a pubsub (similar to the demo on that page). Each client registers as it's id. To find a particular client, you do a lookup on that client id. To subscribe to a client, you add a property to that process of the client id being subscribed to. To publish, you call gproc:send(ClientId,Message). This covers your use case, the more general room based chat as well, and can handle distributed masterless registry of processes.
I haven't tested to see if it scales to millions, but it uses ets to do the storage and gproc is rock solid code by Ulf Wiger. I wouldn't count on being able to write a better implementation.
I'm also kind of new to Erlang (a couple of months), so I hope this can put you in the correct path :)
First of all, since you're a "newbie", you should know about these sites:
- Erlang Official Documentation:
Most common modules are in the stdlib application, so start from there.
- Alternative Documentation:
There's a real time search engine, so it is really good when searching
for specific modules.
- Erlang Programming Rules:
For you to enter in the mindset of erlang programming.
- Learn You Some Erlang Book:
A must read for everyone starting with Erlang. It's really comprehensive
and fun to read!
- Trapexit.org:
Forum and cookbooks, to search for common problems faced by programmers.
Well, thinking about a non persistent database, I would suggest the sets
or gb_sets
modules (documentation here).
If you want persistence, you should try dets
(see documentation above), but I can't state anything about efficiency, so you should research this topic a bit further.
In the book Learn You Some Erlang there is a chapter on data structures that says that sets
are better for read intensive systems, while gb_sets
is more appropriate for a balanced usage.
Now, Messaging systems are what everyone wants to do when they come to Erlang because the two naturally blend. However, there are a number of things to look into before one continues. Messaging basically involves the following things: User Registration
, User Authentication
, Sessions Management
,Logging
, Message Switching/routing
e.t.c.
Now, to do all or most of these, one needs to have a Database, certainly IN-MEMORY, thats leads me to either Mnesia
or ETS Tables
. Since you are new to Erlang, i suppose you have not yet really mastered working with these. At one moment, you will need to maintain Who is communicating with who
, Who is available for Chat
e.t.c. Hence you might need to look up things and write things some where.
Another thing is you have not told us the Client. Is it going to be a Web Client (HTTP), is it an entirely new protocol you are implementing over raw Sockets ? Which ever way, you will need to master something called: Concurrency in Erlang
. If a user connects and is assigned an ID
, if your design is A process Per User
, then you will have to save the Pids of these Processes or register them against some criteria, yet again monitor them if they die e.t.c. Which brings me to OTP
and Supervision trees
. There is quite alot, however, tell us more about the Client and Server interaction, the Network Communication you need e.t.c. Or is it just a simple Erlang RPC project you are doing for your own revision ?
EDIT
Use ETS Tables
, or use Mnesia RAM tables
. Do not think of registering these Pids or Storing them in a list, Array or set. Look at this solution which was given to this question