[GC] "map remapping failed" in long-running server

Boehm, Hans hans_boehm@hp.com
Tue, 11 Nov 2003 14:13:35 -0800


I talked to David Mosberger to try to understand how the kernel behaves.

Apparently mmap with MAP_FIXED implicitly unmaps whatever was there before.
Thus the problem can't have been an intervening mapping at the same address.
That would have failed, but in a different way.  (It sounds like the collector
should immediately be remapping the memory with PROT_NONE to avoid this
potential issue, but it's only a potential issue.  And I just fixed it in my
sources.)

On the other hand:

1) Linux kernels have a default limit of 65536 mappings per process.  (On some
RedHat kernels, this should be changeable through /proc/sys/vm/max_map_count,
but this doesn't seem to be in the official 2.6 test releases.)

2) The /proc/.../maps size suggests you're in that ballpark.

3) Some 2.4.x kernels were known to have problems with properly merging mappings.
This is probably also aggravated by the collectors use of maps against /dev/zero
instead of MAP_ANON (which I just fixed).

I strongly suspect that's what you're running into.  I would try increasing the
limit if you can and/or upgrading to a more recent 2.4 kernel.

Hans

> -----Original Message-----
> From: Kenneth C. Schalk [mailto:ken@xorian.net]
> Sent: Tuesday, November 11, 2003 10:50 AM
> To: Boehm, Hans
> Cc: gc@napali.hpl.hp.com
> Subject: RE: [GC] "map remapping failed" in long-running server
> 
> 
> It took a little longer than I was expecting for the failure to
> re-occur.
> 
> Quoting "Boehm, Hans" <hans_boehm@hp.com>:
> > You should also just be able to call GC_print_address_map() instead
> > of forking a separate process.
> 
> I took that suggestion, but that function didn't seem to 
> actually print
> the map.  Here's the output of the last collection plus the failure:
> 
> Initiating full world-stop collection 604 after 250170680 allocd bytes
> 11182080 bytes in heap blacklisted for interior pointers
> --> Marking for collection 604 after 250170680 allocd bytes + 
> 22038528 wasted bytes
> Starting marking for mark phase number 603
> Starting mark helper 0
> Starting mark helper 1
> Finished mark helper 1
> Finished mark helper 0
> Finished marking for mark phase number 603
> Collection 603 reclaimed 571131836 bytes ---> heapsize = 
> 1467326464 bytes
> World-stopped marking took 1180 msecs
> Bytes recovered before sweep - f.l. count = -47576
> Immediately reclaimed 107347152 bytes in heap of size 
> 1467326464 bytes(365973504
> unmapped)
> 240033072 (atomic) + 64344276 (composite) collectable bytes in use
> Finalize + initiate sweep took 0 + 70 msecs
> Complete collection took 1310 msecs
> MMap failed at 0x12c5e000 (length 12288) with errno 12
> ---------- Begin address map ----------
> ---------- End address map ----------
> mmap remapping failed
> 
> (12 is ENOMEM.)
> 
> A quick look at GC_apply_to_maps makes me think that perhaps the
> alloca(3) call to get a buffer to store the contents of
> /proc/self/maps in memory may have failed.  The maps file of the
> currently running instance of the server looks to be 1435305 bytes in
> length.
> 
> Should I switch back to my version that uses system(3)?  Any other
> suggestions?
> 
> --Ken
> 
> P.S.  There were no messages from the kernel to indicate a problem.
> (In fact, the only message within an hour of the failure was from
> automount, and it was for a directory that the server doesn't access.)
>