Re[2]: [PATCH] Re: Found? Re: [Gc] Assertion failure in GC_malloc_uncollectabl

Ivan Maidanski ivmai at mail.ru
Tue May 15 02:21:32 PDT 2012


Hi Jan,

Tue, 15 May 2012 10:53:20 +0200 Jan Wielemaker <J.Wielemaker at vu.nl>:
> Hi Ivan,
> 
> On 05/15/2012 10:35 AM, Ivan Maidanski wrote:
> > Hi Jan Hans,
> >
> > Jan -
> >
> > I'd like to propose an alternative solution:
> >
> >          GC_ASSERT(hhdr ->  hb_n_marks == 0);
> >          hhdr ->  hb_n_marks = 1;
> >
> > ->
> >
> > #       ifdef THREADS
> >            hhdr->hb_n_marks++; /* It might be already updated concurrently. */'
> 
> Who am I to think you are wrong?  Still, I'll try :-)  I think this is 
> wrong because another thread has done the mark between 
> GC_malloc_generic() released the lock and GC_malloc_uncollectable() 
> re-acquired it.  So, hhdr->hb_n_marks = 1 when we get here.  Surely,
> a big block always has hhdr->hb_n_marks being either 0 or 1.  You
> make it 2 if I read this correctly.

Yes, I think you are right - we should have 1 here.

> 
> My motivation to move this into GC_malloc_generic() was twofold: clearly
> GC can come in between and it seems a waste to acquire the lock twice 
> for a single operation with only a few unlocked instructions in between.

I understand. But, the other alternative is to use lock-less GC_generic_malloc_internal (not sure right now that its functionality is the same as GC_generic_malloc).

BTW. We should also fix GC_malloc_atomic_uncollectable.

Regards.

> 
> With the proposed patch, my program has run its stress test with 8 
> threads overnight without problems.  I think it would be enough to
> simply drop the GC_ASSERT() if my assessment is correct that it is
> not possible that the object gets collected by the GC doing its work
> between the two locks.
> 
> Anyway, I don't care much as long as the crash is gone :-)
> 
> 	Regards --- Jan
> 
> 
> > #       else
> >            GC_ASSERT(hhdr ->  hb_n_marks == 0);
> >            hhdr ->  hb_n_marks = 1;
> > #       endif
> >
> > Let me know if I'm wrong.
> >
> > Hans -
> > What's your opinion?
> >
> > Regards.
> >
> > Mon, 14 May 2012 15:55:23 +0200 Jan Wielemaker<J.Wielemaker at vu.nl>:
> >> Hi,
> >>
> >> Attached patch changes changes my program from crashing within a minute
> >> to already running for an hour.
> >>
> >> Patch is relative to git fetched a few hours ago.  See below for details
> >> on the rationale.
> >>
> >> Regards --- Jan
> >>
> >>
> >> On 05/14/2012 02:34 PM, Jan Wielemaker wrote:
> >>> Hi,
> >>>
> >>> The thing below is still bothering me, but I think I tracked it down to
> >>> a wrong GC_ASSERT() in GC_malloc_uncollectable(). For large objects, we
> >>> have this code:
> >>>
> >>> op = (ptr_t)GC_generic_malloc((word)lb, UNCOLLECTABLE);
> >>> if (0 == op) return(0);
> >>>
> >>> GC_ASSERT(((word)op&  (HBLKSIZE - 1)) == 0); /* large block */
> >>> hhdr = HDR(op);
> >>> /* We don't need the lock here, since we have an undisguised */
> >>> /* pointer. We do need to hold the lock while we adjust */
> >>> /* mark bits. */
> >>> LOCK();
> >>> GC_ASSERT(hhdr ->  hb_n_marks == 0);
> >>>
> >>> Where the last GC_ASSERT triggers for me. I consistently find that
> >>> hb_last_reclaimed is one less than GC_gc_no. I added the same assertion
> >>> to GC_generic_malloc() before the GC lock was released. In that case, it
> >>> is still the assert above that triggers. This can be explained:
> >>> GC_generic_malloc() releases the lock, a GC marks the object and the
> >>> assertion in GC_malloc_uncollectable() fails. Right?
> >>>
> >>> Now, I think that the assertion is simply wrong and there is no need
> >>> to change anything else to the code, but I would be grateful if someone
> >>> with in-depth knowledge can confirm this.
> >>>
> >>> Cheers --- Jan
> >>>
> >>> P.s. The crash happens consistently(?) while the block is allocated
> >>> from the final for-loop in GC_allochblk() (i.e., allocating
> >>> from a bigger free list and splitting. This might suggest
> >>> there is more to it that just removing the GC_ASSERT().
> >>>
> >>> On 05/02/2012 11:36 AM, Jan Wielemaker wrote:
> >>>> Hi,
> >>>>
> >>>> I'm experiencing assertion failures with GC_malloc_uncollectable() on a
> >>>> big block (usually a few 100 Kb): GC_ASSERT(hhdr ->  hb_n_marks == 0).
> >>>>
> >>>> Now, I have to admit this is on a modified version (as discussed on this
> >>>> list). The version is based on a1467f2140f9db3408f6214137c7dc31f10d0bd9
> >>>> (Jan 16 git version). Using the unmodified version would require many
> >>>> changes to the program :-(
> >>>>
> >>>> This is merely a quick investigation to see whether this rings a bell
> >>>> with someone. Platform is X64 Linux (Ubuntu). GC Config flags:
> >>>> '--enable-threads=pthreads' '--enable-parallel-mark'
> >>>> '--enable-large-config' '--enable-gc-assertions'
> >>>>
> >>>> hhdr is this:
> >>>>
> >>>> (gdb) p *hhdr
> >>>> $1 = {hb_next = 0x0, hb_prev = 0x0, hb_block = 0x23ba000,
> >>>> hb_obj_kind = 2 '\002', hb_flags = 0 '\000', hb_last_reclaimed = 7,
> >>>> hb_sz = 196624, hb_descr = 196624, hb_large_block = 1 '\001',
> >>>> hb_map = 0x638f60, hb_n_marks = 1, hb_n_uncollectable = 0,
> >>>> _mark_byte_union = {_hb_marks = "\001", '\000'<repeats 255 times>,
> >>>> "\001",
> >>>> dummy = 1}}
> >>>>
> >>>> Only one thread is doing work at the moment of the crash; several others
> >>>> block waiting on the allocation lock. As I read it, hhdr suggests that
> >>>> the block is marked and in use, so it shouldn't be handed out by a new
> >>>> allocation call, right?
> >>>>
> >>>> The program does lots of complicated stuff using the garbage collector,
> >>>> but the crash is always in the big GC_malloc_uncollectable() called from
> >>>> the same context. These allocations manage a big buffer that is resized
> >>>> (doubled) using a new GC_malloc_uncollectable(), copy the data and call
> >>>> GC_free() on the old data. Several threads play the same game, on their
> >>>> own private datastructures. Of course, this means that an old buffer
> >>>> released using GC_free() in one thread is typically a good candidate for
> >>>> another thread. The crash is not very frequent: I need to make the
> >>>> program go through the critical process several hundred times to get
> >>>> a crash.
> >>>>
> >>>> Thanks for any hint
> >>>>
> >>>> --- Jan
> >>>> _______________________________________________
> >>>> Gc mailing list
> >>>> Gc at linux.hpl.hp.com
> >>>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
> >>>
> >>> _______________________________________________
> >>> Gc mailing list
> >>> Gc at linux.hpl.hp.com
> >>> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
> >>
> >>
> >> _______________________________________________
> >> Gc mailing list
> >> Gc at linux.hpl.hp.com
> >> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
> 
> _______________________________________________
> Gc mailing list
> Gc at linux.hpl.hp.com
> http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
> 



More information about the Gc mailing list