[LCP]debugging a SIGSEGV

Mon Feb 7 13:22:02 UTC 2005

Thanks for the responses -- I have not had any luck with resolving this
issue, yet. Heres some more information on the program and the environment
its being run on: the code is written standard C and implements a monte
carlo type algorithm. The code requires < ~sizeof(double)*200000 bytes.
The code is complied using gcc (3.3.4) on a debian linux kernel
(2.4.18-1-k7) on an athlon xp1800+, with 1G of memory.  Its a plain,
vanilla number crunching code; it does not invoke pthread_atfork() or any
other such function, its compiled/linked using-Wall -ggdb -lm, as flags.

The program crashes after different times, depending on its inputs (an
input file from the command line sets the "number of groups" that the code
needs to simulate)  Earlier, I had noticed that it used to exit with
SIGSEGV after processing every single group from the input file, saving
the data from its run - only that the program crashed, without exiting
gracefully.  There is now a second different input file that makes it
crash (with a SIGSEGV) at the end of its first run -- no data is saved. 

I tried using strace -fF on the program with both these two different
inputs and the SIGSEGV line matching the output from strace remains the
same in both cases: 
26811 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
26811 +++ killed by SIGSEGV +++

Any clues to what this means?

Jack - how does one land up trashing the libc stack? Can you provide some 
examples that might do so? Am trying Valgrind next...

Thanks,

-K

On Sun, 6 Feb 2005, Jack Lloyd wrote:

> 
> I've seen this happen (SIGSEGV after main returns) a few times. It's been a
> while, but ISTR that in the end I always found I was trashing some internal
> memory structure. For example, if you trash something in the stack of the libc
> startup code (called __libc_start_main in glibc, I think), you'll get a crash
> after main returns and __libc_start_main resumes.
> 
> In C++ this can also happen if a destructor for a global object does something
> stupid; looking at the stack trace at the time of failure will probaby diagnose
> this (you'll see in the call chain a GCC generated function for destroying
> global objects).
> 
> A tool that may help tracking this down is Valgrind. If nothing else, a clean
> Valgrind run will eliminate the possibility that it's a memory error.
> 
> -Jack
> 
> On Sun, Feb 06, 2005 at 02:32:48PM -0500, Karthik Vishwanath wrote:
> > My program quits _after_ processing the last lines of main() with a
> > SIGSEGV (the last lines of main() are a printf() statements). I compiled
> > the program using the -ggdb flags and used ddd to execute the program to
> > get any more info. on where the code dies and heres what ddd tells me:
> > "Program received signal SIGSEGV, Segmentation fault. 0x4012be5b in
> > __register_atfork () from /lib/libc.so.6 (gdb) "
> > 
> > Can anyone tell me what is going on, and why this could be happening (I
> > have checked writes to malloc'd *s very carefully and I don't think its
> > because of accessing an uninitialized memory location etc.)?  Any/all
> > pointers (no pun intended) toward throwing some light on how to get rid of
> > the segmentation fault will be very appreciated.
> > 
> > Thanks,
> > 
> > -K
> > 
> > 
> > 
> > _______________________________________________
> > This is the Linux C Programming List
> > :  http://lists.linux.org.au/listinfo/linuxcprogramming List
> 
> _______________________________________________
> This is the Linux C Programming List
> :  http://lists.linux.org.au/listinfo/linuxcprogramming List
>