[Gc] GC patch for DJGPP

Boehm, Hans hans_boehm@hp.com
Sat, 1 Nov 2003 09:59:40 -0800

> From: Doug Kaufman [mailto:dkaufman@rahul.net]
> On Fri, 31 Oct 2003, Boehm, Hans wrote:
> > Thanks very much.  The Makefile clearly needed work.
> > 
> > I applied it with two exceptions:
> > 
> > 1) I don't understand the change to finalize.c, and it 
> doesn't look right.
> > It appears to omit finalizer calls if the supplied argument 
> happens to be
> > zero.  Is this just leftover debugging code?
> Sorry, I am not a progammer, just a user trying to get this to work. I
> thought I was excluding a null pointer, since fo_client_data is ptr_t.
> DJGPP doesn't allow dereferencing a null pointer. Does this 
> work better?
Actually, most of the patch looked great.

Fo_client_data is basically an uninterpreted value that you specify when
you register a finalizer, and it is passed back to the client.  The GC
doesn't and shouldn't care what it is.  Its type is basically arbitrary,
in that the client will normally convert it to and from its real type.
(It could and perhaps should be void * instead.) 

If you're seeing a null pointer dereference in the client supplied finalizer,
there's a bug somewhere else.  Either the wrong client_data value is passed
when the finalizer is registered, or the finalizer has a missing null pointer
check, or the collector is somehow generating a bogus finalizer invocation.

Based on your stack trace, it looks like CORD_lf_close_proc is calling abort()
because fclose() failed.  In this case client_data is not examined at all.
With your patch you are just failing to close the file descriptor instead,
thus masking, but not eliminating the problem.

If you want to track this down, the next step would be to print errno instead
of just aborting at that point, and to try to determine why the fclose() failed.
My guess is that this problem is related to some subtle differences in the
stdio library between djgpp and e.g. glibc.  It probably does not reflect a
real GC issue, though it may be a real problem with cords on djgpp.

Nearly all modern platforms generate a SIGSEGV or similar on a null pointer
dereference.  None of this should be unique to DJGPP, except some of the file
> --- finalize.c.orig	2003-03-31 15:09:18.000000000 -0800
> +++ finalize.c	2003-10-31 22:56:04.000000000 -0800
> @@ -784,6 +784,7 @@
>  	    GC_finalize_now = fo_next(curr_fo);
>  #	endif
>   	fo_set_next(curr_fo, 0);
> +	if (curr_fo -> fo_client_data)
>      	(*(curr_fo -> fo_fn))((ptr_t)(curr_fo -> fo_hidden_base),
>      			      curr_fo -> fo_client_data);
>      	curr_fo -> fo_client_data = 0;
This is equivalent to the original patch in my view.

> If you don't have a change such as this, the cordtest program crashes
> with this stack traceback when doing "make test":
>   0x000121d4 __djgpp_traceback_exit+48, file 
> e:/downloads/gc/gc6.3alpha2/stubborn.c, line 324
>   0x000122ae raise+90, file 
> e:/downloads/gc/gc6.3alpha2/stubborn.c, line 324
>   0x000104ab abort+27, file 
> e:/downloads/gc/gc6.3alpha2/stubborn.c, line 324
>   0x0000522b CORD_lf_close_proc+55, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cordxtra.c, line 560
>   0x000073d2 GC_invoke_finalizers+66, file 
> e:/downloads/gc/gc6.3alpha2/finalize.c, line 790
>   0x00007425 GC_notify_or_invoke_finalizers+29, file 
> e:/downloads/gc/gc6.3alpha2/finalize.c, line 819
>   0x00006227 GC_generic_malloc+31, file 
> e:/downloads/gc/gc6.3alpha2/malloc.c, line 187
>   0x0000634e GC_malloc_atomic+86, file 
> e:/downloads/gc/gc6.3alpha2/malloc.c, line 268
>   0x00004e92 CORD_ec_flush_buf+34, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cordxtra.c, line 417
>   0x00005e11 CORD_vsprintf+1733, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cordprnt.c, line 337
>   0x00005e6f CORD_fprintf+27, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cordprnt.c, line 359
> ebp=000a5d20 esp=000a5c70 
> cs: sel=0147  base=836e3000  limit=ff310fff
> ds: sel=014f  base=836e3000  limit=ff310fff
> es: sel=014f  base=836e3000  limit=ff310fff
> fs: sel=0127  base=0001dfb0  limit=0000ffff
> gs: sel=015f  base=00000000  limit=0010ffff
> ss: sel=014f  base=836e3000  limit=ff310fff
> App stack: [000a6170..00026170]  Exceptn stack: [000260cc..0002418c]
> Call frame traceback EIPs:
>   0x000121d4 __djgpp_traceback_exit+48, file 
> e:/downloads/gc/gc6.3alpha2/stubbor
>   0x000122ae raise+90, file 
> e:/downloads/gc/gc6.3alpha2/stubborn.c, line 324
>   0x000104ab abort+27, file 
> e:/downloads/gc/gc6.3alpha2/stubborn.c, line 324
>   0x0000522b CORD_lf_close_proc+55, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cord
>   0x000073d2 GC_invoke_finalizers+66, file 
> e:/downloads/gc/gc6.3alpha2/finalize.
>   0x00007425 GC_notify_or_invoke_finalizers+29, file 
> e:/downloads/gc/gc6.3alpha2
>   0x00006227 GC_generic_malloc+31, file 
> e:/downloads/gc/gc6.3alpha2/malloc.c, li
>   0x0000634e GC_malloc_atomic+86, file 
> e:/downloads/gc/gc6.3alpha2/malloc.c, lin
>   0x00004e92 CORD_ec_flush_buf+34, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cordx
>   0x00005e11 CORD_vsprintf+1733, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cordprn
>   0x00005e6f CORD_fprintf+27, file 
> e:/downloads/gc/gc6.3alpha2/./cord/cordprnt.c
> make.exe: *** [test] Error -1
> [exited with 2]
> > 2) The change in the include file path to mach_dep.c is inconsistent
> > with all the other source files.  I set SPECIALCFLAGS in Makefile.dj
> > to the same thing as in Makefile instead.  Could you verify 
> that that works?
> This works fine. I presume you meant the same as in Makefile.direct.
> Makefile doesn't seem to have SPECIALCFLAGS.
>                              Doug

Yes, I should have said Makefile.direct.  Makefile is a copy of Makefile.direct
in the original distribution (for the sake of backward compatibility), but
it's overwritten by the configure script.

Thanks for following up on this.