可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am working on implementing a database server in C that will handle requests from multiple clients. In order to do so I am using fork() to handle connections for individual clients.
The server stores data in the heap which consists of a root pointer to hash tables of dynamically allocated records. The records are structs that have pointers to various data-types. I would like for the processes to be able to share this data so that when a client makes a change to the heap the changes will be visible for the other clients.
I have learned that fork() uses COW (Copy On Write) and my understanding is that it will copy the heap (and stack) memory of the parent process when the child will try to modify the data in memory.
I have found out that I can use the shm library to share memory.
-Would it suffice to share the root pointer of the database or do I have to make all allocated memory as shared?
-If a child allocates memory will the parent / other children be able to access it?
-Also if a child allocates memory and is later killed will the allocated memory still stay on the heap?
So for example would the code below be a valid way to share heap memory (in shared_string)? If a child were to use similar code (i.e. starting from //start ) would other children be able to read/write to it while the child is running and after it's dead?
key_t key;
int shmid;
key = ftok("/tmp",'R');
shmid = shmget(key, 1024, 0644 | IPC_CREAT);
//start
char * string;
string = malloc(sizeof(char) * 10);
strcpy(string, "a string");
char * shared_string;
shared_string = shmat(shmid, string, 0);
strcpy(shared_string, string);
回答1:
First of all, fork
is completely inappropriate for what you're trying to achieve. Even if you can make it work, it's a horrible hack. In general, fork
only works for very simplistic programs anyway, and I would go so far as to say that fork
should never be used except followed quickly by exec
, but that's aside from the point here. You really should be using threads.
With that said, the only way to have memory that's shared between the parent and child after fork
, and where the same pointers are valid in both, is to mmap
(or shmat
, but that's a lot fuglier) a file or anonymous map with MAP_SHARED
prior to the fork
. You cannot create new shared memory like this after fork
because there's no guarantee that it will get mapped at the same address range in both.
Just don't use fork
. It's not the right tool for the job.
回答2:
Sorry for answering a month later, but I don't think the existing answers gave what the OP asked for.
I think you are basically looking to do what is done by Redis (and propbably others).
They describe it in http://redis.io/topics/persistence (search for "copy-on-write").
- threads defeat the purpose
- classic shared memory (shm, mapped memory) also defeats the purpose
The primary benefit to using this method is avoidance of locking, which can be a pain to get right.
As far as I understand it the idea of using COW is to:
- fork when you want to write, not in advance
- the child (re)writes the data to disk, then immediately exits
- the parent keeps on doing its work, and detects (SIGCHLD) when the child exited.
If while doing its work the parent ends up making changes to the hash, the kernel
will execute a copy for the affected blocks (right terminology?).
A "dirty flag" is used to track if a new fork is needed to execute a new write.
Things to watch out for:
- Make sure only one outstanding child
- Transactional safety: write to a temp file first, then move it over so that you always have a complete copy, maybe keeping the previous around if the move is not atomic.
- test if you will have issues with other resources that get duplicated (file descriptors, global destructors in c++)
You may want to take gander at the redis code as well
回答3:
Would it suffice to share the root pointer of the database or do I have to make all allocated memory as shared?
No, because each process will have a its own private memory range. Copy-on-write is a kernel-space optimization that is transparent to user space.
As others have said, SHM or mmap'd files are the only way to share memory between separate processes.
回答4:
Many popular HTTP servers use fork() to take advantage of multiple processors, Nginx is one of those.
Threading brings with it an entire set of headaches that I personally like to avoid unless absolutely necessary, like, your program will never be free of crashes caused by multithreading bugs (my experience with other people's threading code).
Multiprocessing lets you use all the processors on your machine, without implicitly sharing memory between execution threads, by default avoiding all typical, multithreading, endless bugs.
I like to sleep at night without getting those 2am calls, knowing my web facing, high throughput servers aren't going to crash on me because I failed to see one of dozens of multithreading pitfalls that day.
There are many cases where shared memory is pain free, such as, if the data in shared memory is read only. You don't have to worry about locks etc.
回答5:
If you must you fork
, the shared memory seems to be the 'only' choice.
Actually, I think in your scene, the thread is more suitable.
If you don't want to be multi-threaded. Here is another choice,you can only use one-process & one-thread mode, like redis
With this mode,you don't need worry about something like lock
and if you want to scale, just design a route policy,as route with the hash value of the key