I have a libpthread linked application. The core of the application are two FIFOs shared by four threads ( two threads per one FIFO that is ;). The FIFO class is synchronized using pthread mutexes and it stores pointers to big classes ( containing buffers of about 4kb size ) allocated inside static memory using overloaded new and delete operators ( no dynamic allocation here ).
The program itself usually works fine, but from time to time it segfaults for no visible reason. The problem is, that I can't debug the segfaults properly as I'm working on an embedded system with an old linux kernel (2.4.29) and g++ (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)).
There's no gdb on the system, and I can't run the application elsewhere ( it's too hardware specific ).
I compiled the application with -g and -rdynamic flags, but an external gdb tells me nothing when I examine the core file ( only hex addresses ) - still I can print the backtrace from the program after catching SIGSEGV - it always looks like this:
Backtrace for process with pid: 6279
-========================================-
[0x8065707]
[0x806557a]
/lib/libc.so.6(sigaction+0x268) [0x400bfc68]
[0x8067bb9]
[0x8067b72]
[0x8067b25]
[0x8068429]
[0x8056cd4]
/lib/libpthread.so.0(pthread_detach+0x515) [0x40093b85]
/lib/libc.so.6(__clone+0x3a) [0x4015316a]
-========================================-
End of backtrace
So it seems to be pointing to libpthread...
I ran some of the modules through valgrind, but I didn't find any memory leaks (as I'm barely using any dynamic allocation ).
I thought that maybe the mutexes are causing some trouble ( as they are being locked/unlocked about 200 times a second ) so I switched my simple mutex class:
class AGMutex {
public:
AGMutex( void ) {
pthread_mutex_init( &mutex1, NULL );
}
~AGMutex( void ) {
pthread_mutex_destroy( &mutex1 );
}
void lock( void ) {
pthread_mutex_lock( &mutex1 );
}
void unlock( void ) {
pthread_mutex_unlock( &mutex1 );
}
private:
pthread_mutex_t mutex1;
};
to a dummy mutex class:
class AGMutex {
public:
AGMutex( void ) : mutex1( false ) {
}
~AGMutex( void ) {
}
volatile void lock( void ) {
if ( mutex1 ) {
while ( mutex1 ) {
usleep( 1 );
}
}
mutex1 = true;
}
volatile void unlock( void ) {
mutex1 = false;
}
private:
volatile bool mutex1;
};
but it changed nothing and the backtrace looks the same...
After some oldchool put-cout-between-every-line-and-see-where-it-segfaults-plus-remember-the-pids-and-stuff debugging session it seems that it segfaults during usleep (?).
I have no idea what else could be wrong. It can work for an hour or so, and then suddenly segfault for no apparent reason.
Has anybody ever encountered a similar problem?