[Gc] Alignment, Executable memory and MMX registers

Hans Boehm Hans.Boehm at hp.com
Mon Oct 18 22:34:18 PDT 2004


On Mon, 18 Oct 2004, Ben Hutchings wrote:

> Matt Bromberg <doom_portfolio at yahoo.com> wrote:
> > ...
> > 1) Do you include the MMX registers and/or the XMM
> > registers in your root set?
>
> No.  The root sets includes EAX, EDX, ECX, EBX, ESI, EDI and EBP for
> all threads.  For the thread running the collection this is
> controlled by GC_push_regs() in mach_dep.c and for all other threads
> it is controlled by GC_push_all_stacks() in win32_threads.c.
>
I would guess that in most cases USE_GENERIC_PUSH_REGS would also
do the trick.  It uses setjmp or similar to push the callee-save
registers.
> <snip>
>
> > 3) I plan to allocate memory for the use of double
> > arrays that probably should be 16 byte aligned. Is
> > there an easy way to obtain my desired alignment with
> > GC_malloc() et. al..?
>
> The ALIGN_DOUBLE macro forces alignment on 8-byte boundaries (on
> 32-bit machines) but there is nothing similar to force greater
> alignment.
>
> It would perhaps be simplest to use wrappers like these:
>
>     void * GC_malloc_align16(size_t sz)
>     {
>         return (void *)((size_t)GC_malloc(sz + 12) + 12 & ~15);
>     }
>
>     void GC_free_align16(void * p)
>     {
>         GC_free(GC_base(p));
>     }
>
That should work.  If performance becomes a serious issue, you should be
able to do better.  If the real internal object size is a multiple of 16
bytes, the objects wil all be 16 byte aligned.  But the real size is
computed from the requested size in a nontrivial way, so it would
take some work to leverage this.

There is also an underdocumented GC_memalign that does what you want
in about the way Ben suggested.

> > 4) Since I'm not using C, but rather assembly and
> > linking in the garbage collector as a DLL is it
> > possible I will run into problems due to assumptions
> > concerning C compiled executables?
>
> I don't see that as likely.
>
> > I must say I do
> > not fully understand the magic involved in how you get
> > the data segments into your root sets, especially if
> > libraries are loaded dynamically at run time.
>
> The collector should find them by scanning the address space with
> VirtualQuery.  If they are later unloaded, leading to an access
> violation while scanning, this should be caught and fixed up using
> Structured Exception Handling (SEH).  This doesn't work on compilers
> that don't support SEH, though.
>
> > I also
> > don't understand how you deal with the mutator messing
> > up the integrity of your marked data, unless you go
> > through the whole tree every time you garbage collect,
> > or possibly lock out memory locations and force an
> > interrupt or something.
>
> Currently, on Windows, the collector does go through the whole tree
> (or rather graph) of objects referred to from the root set.  There
> are techniques called generational and incremental garbage collection
> that can avoid this, but the necessary supporting code has not yet
> been written for Windows.  I may do so at some point.
>
I believe incremental GC should basically work on Windows, at least
in the absence of threads.  I think there are the standard issues with
system calls that write to the heap.  And for some reason, incremental
collection is no longer tested by default with Windows threads.
I don't recall why that was turned off.  But I think single threaded
tests do run with incremental GC.

Having said that, incremental GC currently rarely improves throughput,
and probably costs more on Windows than Linux.

Hans


More information about the Gc mailing list