I have a trie (implemented with tuples and lists) with several thousand entries and I would like to support concurrent reads. The memory footprint of the data is in the 10-20 MB range. The trie is built once and read only after that.
What is the recommended way to maintain the state and give clients concurrent access?
Here is what I have tried:
1) Created a gen_server with the trie as the state. This worked fine but, obviously, all calls were serialized.
2) Modified (1) to spawn a new process for each call which takes the state, the request, and From
. Each new process traversed the trie and called gen_server:reply/2
with the result. This solution didn't seem to work because memory and CPU usage exploded . I assume this happened because the state was copied to the spawned process for every call.
mochiglobal from mochiweb is designed for exactly this kind of use case. Basically it will take your data structure and compile it into a module, so the data is shared (zero copy for module constants). Only works well on data structures that don't change often, but it sounds like that's what you have.
Another approach would be to make a pool of gen_servers (with supervision), and then allocate incoming connections to a server in the pool. This eases the gen_server bottleneck associated with your first approach. This approach also allows some tuning by adjusting the number of processes in the pool. LearnYouSomeErlang has a chapter on this.
if your state is changing often implement your model/structure over ETS.
You can create ETS table with concurrent read/write options which would increase performance.