Go memory leak when doing concurrent os/exec.Comma

2019-07-09 06:55发布

问题:

I am running into a situation where a go program is taking up 15gig of virtual memory and continues to grow. The problem only happens on our CentOS server. On my OSX devel machine, I can't reproduce it.

Have I discovered a bug in go, or am I doing something incorrectly?

I have boiled the problem down to a simple demo, which I'll describe now. First build and run this go server:

package main

import (
    "net/http"
    "os/exec"
)

func main() {
    http.HandleFunc("/startapp", startAppHandler)
    http.ListenAndServe(":8081", nil)
}

func startCmd() {
    cmd := exec.Command("/tmp/sleepscript.sh")
    cmd.Start()
    cmd.Wait()
}

func startAppHandler(w http.ResponseWriter, r *http.Request) {
    startCmd()
    w.Write([]byte("Done"))
}

Make a file named /tmp/sleepscript.sh and chmod it to 755

#!/bin/bash
sleep 5

And then make several concurrent requests to /startapp. In a bash shell, you can do it this way:

for i in {1..300}; do (curl http://localhost:8081/startapp &); done

The VIRT memory should now be several gigabytes. If you re-run the above for loop, the VIRT memory will continue to grow by gigabytes every time.

Update 1: The problem is that I am hitting OOM issues on CentOS. (thanks @nos)

Update 2: Worked around the problem by using daemonize and syncing the calls to Cmd.Run(). Thanks @JimB for confirming that .Wait() running in it's own thread is part of the POSIX api and there isn't a way to avoid calling .Wait() without leaking resources.

回答1:

Each request you make requires Go to spawn a new OS thread to Wait on the child process. Each thread will consume a 2MB stack, and a much larger chunk of VIRT memory (that's less relevant, since it's virtual, but you may still be hitting a ulimit setting). Threads are reused by the Go runtime, but they are currently never destroyed, since most programs that use a large number of threads will do so again.

If you make 300 simultaneous requests, and wait for them to complete before making any others, memory should stabilize. However if you continue to send more requests before the others have completed, you will exhaust some system resource: either memory, file descriptors, or threads.

The key point is that spawning a child process and calling wait isn't free, and if this were a real-world use case you need to limit the number of times startCmd() can be called concurrently.