[Gc] "mmap remapping failed" in long-running server

Boehm, Hans hans_boehm@hp.com
Wed, 5 Nov 2003 11:42:42 -0800

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

Content-Type: text/plain;

Ken -

Can you

1) Apply the attached patch to os_dep.c to print the offending errno, etc.,

2) Run the server with the GC_LOOP_ON_ABORT environment variable set, and

3) Either check that address in /proc/<server_pid>/maps or send me a copy
of that file when the server hangs after the failure ?

On thinking about this, I'm concerned about two possible causes :

1) The kernel decided to map something else into the whole in the heap that
was left when the page was originally unmapped.  I think this is fixable,
though I'm not sure that the solutions are as trivial
as I would like.  (An interesting solution on Linux might be to not unmap the page
for very long at all.  If you unmap and remap it immediately, it still shouldn't
have real memory associated with it until the first write.  This may have to be done
with the world stopped to avoid an intervening mapping.  An easier solution may be
to recover from the failure by effectively removing the block from the heap.)

2)  You're exceeding a kernel limit on the number of mappings.  I would 
consult a kernel expert on how to handle this.  I assume you didn't see a message
in the system log?


> -----Original Message-----
> From: gc-admin@napali.hpl.hp.com [mailto:gc-admin@napali.hpl.hp.com]On
> Behalf Of Kenneth C. Schalk
> Sent: Wednesday, November 05, 2003 9:20 AM
> To: gc@napali.hpl.hp.com
> Subject: [Gc] "mmap remapping failed" in long-running server
> I'm responsible for maintaining a server that's linked with the
> garbage collector.  (It's part of Vesta: https://www.vestasys.org/).
> Where I work (which probably has the heaviest loaded such server
> anywhere), we've been having some problems.  With increasing frequency
> (recently about every 24-48 hours), the server has been dying with the
> message "mmap remapping failed" (which is printed at os_dep.c:1844).
> When this happens, there's clearly more memory on the system, and the
> server is usually significantly below its peak total memory size.
> I've tried turning off USE_MUNMAP (as this error is only possible when
> that option is on), but then the server's memory grows to over twice
> what it peaks at with USE_MUNMAP.  Each garbage collection takes
> significantly longer, and eventually the system gets so busy swapping
> that we restart the server to get back to a more responsive state.
> Ideally, I'd like to figure out why "mmap remapping failed" keeps
> happening and stop it, assuming that's possible.
> More relevant details:
> - The collector source is 6.2 plus a couple of minor changes (patch
> attached).
> - The OS is Linux.  The kernel is 2.4.9, and the rest of the system is
> a derivative of RedHat 7.1.  (I know, it's rather old, but these are
> externally imposed constraints.  We hope to be moving to a 2.4.20
> kernel soon.)
> - The hardware is a dual Intel Xeon 1.70GHz with 2GB of physical
> memory and 2GB of swap.
> - Other than USE_MUNMAP, the macros used when building the collector
> are:
> - When running with USE_MUNMAP, the total memory size ranges from 200M
> to 1.2G, averaging around 500M.
> - When running without USE_MUNMAP, the system becomes pretty
> unresponsive as the server's total memory size approaches 3G (usually
> around 2.7-2.8G).  The resident set size stays below 1.4G.
> Any help in ironing out this problem would be appreciated.
> --Ken Schalk

Content-Type: application/octet-stream;
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;

--- gc6.3/os_dep.c	2003-11-03 14:09:36.000000000 -0800=0A=
+++ /home/hboehm/gc_master/os_dep.c	2003-11-05 11:35:09.000000000 =
@@ -115,6 +115,7 @@=0A=
 # include <sys/types.h>=0A=
 # include <sys/mman.h>=0A=
 # include <sys/stat.h>=0A=
+# include <errno.h>=0A=
 #ifdef UNIX_LIKE=0A=
@@ -1853,6 +1854,8 @@=0A=
       result =3D mmap(start_addr, len, PROT_READ | PROT_WRITE | =
 		    MAP_FIXED | MAP_PRIVATE, zero_descr, 0);=0A=
       if (result !=3D start_addr) {=0A=
+	  GC_err_printf3("MMap failed at 0x%lx (length %ld) with errno =
+			  start_addr, len, errno);=0A=
 	  ABORT("mmap remapping failed");=0A=
       GC_unmapped_bytes -=3D len;=0A=