How to keep track of children processes in erlang?

I have a static list of "hosts" with their info, and a dynamic list of "host agents". Each host has one and only one agent for as long as it connects to the server by a TCP connection. As the host may or may not be connected, its agent process may or may not be started. When a TCP packet arrives with the host ID, I need to find out if the "agent" of this host is started or not.

Connection is responsible for receive and send data from tcp socket, parse the data to find out which host it should send to and deliver to it's host agent to handle.

Host kept host informations. Host agent handle incoming data, save host information to host and decide what to send in what format(e.g. ack to client with host id and response code).

And in the data packet, it specified source host and target host, which means it sent by source host and should received by target host. In this case target host could be connected in another connection. That's why a need a global map for all connections for the convenience of get the target host agent pid.

I have a supervision tree in which host_supervisor monitors all the host, and connection_supervisor monitors each connection, host_agent_supervisor monitors agent. host_supervisor, connection_supervisor are all supervised by application supervisor which means they are first level children in supervision tree. But host_agent_supervisor is under connection_supervisor.

Questions:

Is it a good idea to store a map into db with host_id and host_agent_pid pair?
If 1. is true, how to update the host_agent_pid when something wrong and agent is been restarted?
Is there any better idea to implement this case? It seems my solution does not follow "the erlang way".

回答1:

The simple, or quick answer to your question(s) are:

It's fine, though besides a map you could also use gb_trees, dict or an ETS table (maps is the least mature of all these of course). However, that notwithstanding, a key/ID to PID lookup table is fine, in principal. ETS might allow a performance benefit over the others because you can create an ETS table that can be accessed from other processes, eliminating the necessity for a single process to do all the reading and writing. That might or might not be important and/or appropriate.
One simple way to do this is every time a "host agent" starts, it spawns another process, which does nothing but link to the "host agent" and remove the host ID to agent PID mapping from whatever store you have when the "host agent" dies. Another way to do it is cause a mapping store process itself to link to your host agent PIDs, which might give you less concern for possible race conditions.
Possibly. When I read your question I was left with certain questions and a general feeling that the solution I would choose wouldn't lead me to the precise lookup issue you are asking about (i.e. lookup of the PID of a "host agent" upon receipt of a TCP packet), but I can't be sure this isn't because you've worked to minimise your question for Stack Overflow. It's a little unclear to me exactly what the roles, responsibilities and interactions of your "host", "host_agent" and "connection" processes really are, and if they should all exist and/or have separate supervision trees.

So, looking at possible alternatives... When you say "when a TCP packet arrives" I assume you mean when a foreign host connects to a listening socket or sends some data on an existing socket already accepted, and that the host ID is either the hostname (and or port) or it is some other arbitrary ID that the foreign host sends to you after connecting.

Either way... Generally in this sort of scenario, I'd expect that a new process (the "host agent" by the sounds of it in your case) would be spawned to handle the newly established TCP connection (via a dynamic (e.g. simple one to one) supervisor), taking ownership of the socket that is the server side end point of that connection; reading and writing the socket as appropriate, and terminating when the connection is closed.

With that model your "host agent" should always be started if there is a connection already and always be NOT started if there is not a connection, and any incoming TCP packet will end up automatically in the hands of the correct agent, because it will be delivered to the socket that the agent is handling, or if it's a new connection, the agent will be started.

The need to lookup the PID of an agent upon receipt of a TCP packet now never arises.

If you need to lookup the PID of an agent for other reasons though, because say your server sometimes needs to pro actively send data to a possibly connected "host", then you either have to get a list of all the supervised "host agents" and pick out the right one (for this you would use supervisor:which_children/1, as per Hamidreza's answer) OR you would maintain a map of host IDs to PIDs, using map, gb_trees, dict, ets, etc. Which is correct depends on how many "hosts" you could have - if it's more than a handful then you should proabably maintain a map of some sort so that the lookup time doesn't become too big.

Final comment, you might consider looking at gproc if you haven't already, in case you consider it of use for your case. It does this sort of thing.

Edit/addition (following question edit):

Your connection process sounds redundant to me; as suggested above, if you give the socket to the host agent then most of the responsibility of the connection is gone. There's no reason the host agent can't parse the data it receives, as far as I can see there's no value in having another process to parse it, just to then pass it to another process. The parsing itself is probably a deterministic function so it is sensible to have a separate module for it, but I see no point in a separate process.

I don't see the point of your 'host' process, you say "Host kept host informations" which makes it sound like it's just a process that holds a hostname or host ID, something like that?

You also say "it specified source host and target host, which means it sent by source host and should received by target host" which is beginning to make this sound a bit like a chat server, or at least some sort of hub spoke / star network style communication protocol. I can't see why you wouldn't be able to do everything you want by creating a supervisor tree like this:

        top_sup
           |
     .------------------------------.
     |             |                |
map_server    svc_listener      hosts_sup (simple one to one)
                                    |
                        .----------------------------->
                        |    |    |    |   |    |

Here the 'map_server' just maintains a map of host IDs to PIDs of hosts, the svc_listener has the listening socket, and just accepts connections and asks hosts_sup to spawn a new host when a new client connects, and the host processes (under hosts_sup) take responsibility for the accepted socket, and register the host ID and their PID with map_server when they start.

If map_server links to the host PIDs it can automatically clean up when a host dies, and it can provide a suitable API for any process to look up a host PID by host ID.

回答2:

In order to get a list of child processes of a supervisor, you can use supervisor:which_children/1 API. It gets a reference to your supervisor which can be its registered name or PID, and returns a list of its children.

supervisor:which_children(SupRef) -> [{Id, Child, Type, Modules}]