I have a single threaded unix process that communicates over tcp with other processes.
The problem is the following. When I start up the process it hangs (no busy loop) until I kill it.
The funny thing is, as soon as I attach with strace to it, it continues to run with the expected behavior as if there was no problem at all. (always reproducible)
What could be the reason for this behavior? What effect has strace on the state ob a process?
Update: The cause of strace changing the behavior was, because we used openonload with a bug. As soon as we attached strace, the stack was moved back to the kernel and the problem was gone.
Had this problem only once and it was related to signal handling, it is one source of race conditions in single-threaded code.
Most likely that strace output simply slows down the process making deadlocks much less likely. I have seen this happen before with strace OR can happen when adding other debug printing or debug calls.
Deadlocks most often seen with multi-threaded interaction. But in your case you have multiple processes. If the strace frees up the processes every time then I guess the way you open the sockets or handshake on the socket is what is hanging. Buffering and blocking on the socket I think could be getting you into a process-deadlocked state.
Similar question but with a multi-threaded process, deadlock between threads instead of between seperate processes: Using strace fixes hung memory issue
Hard to generalise examples, especially as don't know what your different processes are doing or if they're sharing resources in some way? I will try . . .
Example with one object/resource which should be protected:
One process starts making changes on an object (e.g. adding items to a list/db table)
Another process starts iterating the list/table.
Danger of one of those processes iterating loop being confused and never exiting OR doing something worse like writing to invalid memory.
Example where object/resource is protected by mutexes
The classic simple deadlock with two resources problem. ~ simpler than dining philosophers
One thread/process grabs mutex on object A, does some work.
Another thread/process grabs mutex on object B, does some work.
Same thread/process needs to update object A, waits for mutex for A.
Original thread/process needs to access object B, waits for mutex on B.
. . . . . . . . . . . . @ . . . . . . . . . . .
Silence except for the noise of the wind and a tumbleweed blowing across the landscape.
Deadlocked.