[Gc] Newbie: how to deal with a giant linked list?

Jan Wielemaker J.Wielemaker at vu.nl
Sun Dec 18 12:11:56 PST 2011


Hi Bruce,

On 12/18/2011 08:44 PM, Bruce Hoult wrote:
> On Mon, Dec 19, 2011 at 8:00 AM, Jan Wielemaker<J.Wielemaker at vu.nl>  wrote:
>>> Can you actually scan through the list faster than the GC time? You
>>> could well be running into cache and/or TLB thrashing if the Next
>>> links don't usually point to items on the same memory page and each
>>> page is visited many times.
>>
>> I don't know the answer to that question. Bottom line is that I do not
>> care. As long as cells are part of the structure, I don't need to
>> have GC dealing with them.
>
> Then you don't want a general purpose GC, as scanning the entire
> structure is exactly what has to happen.

That is a good remark. Yes, I don't want complete general purpose GC.
Or, in other words, it would be nice, but it seems I'll hit scalability
limits if I want a 100Gb heap with several 100 millions of objects. A
lot of this data is most of the time uninteresting wrt. GC (i.e., we
know it cannot be collected and we know it doesn't contain pointers that
are subject to GC).  Providing additional info isn't alien to the Boehm
GC API.

>> I want GC starting to deal with them from the
>> moment I disconnect them from the global structure because they might be
>> involved in some operation by some other thread (i.e., I'm mostly
>> interested in references from registers or the stacks of other
>> threads).
>
> It seems to me that what you need is not GC, but a register of objects
> needing deleting and a way to determine that every thread has made
> progress to at least the next node. That could be as simple as a
> boolean per thread that is set by the thread when it moves to the next
> node (usually with no effect), and is cleared by the task that wants
> to clear out the deleted objects pool. Or they could be condition
> variables.

This smells a bit like a variant of the hazard-pointer design? Yes, that
can solve some of my problems but it seems to be that Boehm-GC might
offer a simpler solution that can solve this and other parts of the
application dealing with volatile data that may be used by multiple
threads where general purpose GC is a simple and efficient solution.
Just, it doesn't scale far enough. The ability to change objects from
unmanaged to GC managed would solve that.

>> In other words, I'm trying to find out whether
>>
>>    (a) There is a much better plan to my problem, being ...
>>    (b) Changing object types is feasible, deemed useful and a patch
>>        that allows for this will be accepted into the main source
>>        (provided ...).  Note that I might need some guidance, but I'm
>>        perfectly willing to do the dirty work myself.
>
> There is no possible patch that could change a single object (of less
> than GC page size, normally 4 KB), from one object type to another.
> You would have to copy it somewhere else, therefore changing its
> address, therefore making registers and stacks and other objects not
> point to it any more.

This smells like a show-stopper :-(  Why is that?  Is the `kind' header
stored per page?

	Thanks --- Jan


More information about the Gc mailing list