[Gc] "mmap remapping failed" in long-running server

Kenneth C. Schalk ken@xorian.net
Wed, 05 Nov 2003 12:20:00 -0500 (EST)


This message is in MIME format.

---MOQ106805279966af2f7fc01369b1d582d694f91766fc
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

I'm responsible for maintaining a server that's linked with the
garbage collector.  (It's part of Vesta: http://www.vestasys.org/).

Where I work (which probably has the heaviest loaded such server
anywhere), we've been having some problems.  With increasing frequency
(recently about every 24-48 hours), the server has been dying with the
message "mmap remapping failed" (which is printed at os_dep.c:1844).
When this happens, there's clearly more memory on the system, and the
server is usually significantly below its peak total memory size.

I've tried turning off USE_MUNMAP (as this error is only possible when
that option is on), but then the server's memory grows to over twice
what it peaks at with USE_MUNMAP.  Each garbage collection takes
significantly longer, and eventually the system gets so busy swapping
that we restart the server to get back to a more responsive state.

Ideally, I'd like to figure out why "mmap remapping failed" keeps
happening and stop it, assuming that's possible.

More relevant details:

- The collector source is 6.2 plus a couple of minor changes (patch
attached).

- The OS is Linux.  The kernel is 2.4.9, and the rest of the system is
a derivative of RedHat 7.1.  (I know, it's rather old, but these are
externally imposed constraints.  We hope to be moving to a 2.4.20
kernel soon.)

- The hardware is a dual Intel Xeon 1.70GHz with 2GB of physical
memory and 2GB of swap.

- Other than USE_MUNMAP, the macros used when building the collector
are:

  USE_MMAP
  LARGE_CONFIG
  PARALLEL_MARK
  THREAD_LOCAL_ALLOC
  ATOMIC_UNCOLLECTABLE
  NO_SIGNALS
  NO_EXECUTE_PERMISSION
  ALL_INTERIOR_POINTERS
  GC_LINUX_THREADS
  GC_USE_LD_WRAP
  _REENTRANT

- When running with USE_MUNMAP, the total memory size ranges from 200M
to 1.2G, averaging around 500M.

- When running without USE_MUNMAP, the system becomes pretty
unresponsive as the server's total memory size approaches 3G (usually
around 2.7-2.8G).  The resident set size stays below 1.4G.

Any help in ironing out this problem would be appreciated.

--Ken Schalk

---MOQ106805279966af2f7fc01369b1d582d694f91766fc
Content-Type: application/octet-stream; name="gc6.2.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="gc6.2.patch"

ZGlmZiAtciAtdSBnYzYuMi9vc19kZXAuYyBnYzYuMl9jaGFuZ2VzL29zX2RlcC5jCi0tLSBnYzYu
Mi9vc19kZXAuYwlGcmkgSnVuIDEzIDE1OjExOjAwIDIwMDMKKysrIGdjNi4yX2NoYW5nZXMvb3Nf
ZGVwLmMJVGh1IE9jdCAgOSAxMzoxODoxNyAyMDAzCkBAIC04MzAsOCArODMwLDEyIEBACiAjIGRl
ZmluZSBTVEFUX1NLSVAgMjcgICAvKiBOdW1iZXIgb2YgZmllbGRzIHByZWNlZGluZyBzdGFydHN0
YWNrCSovCiAJCQkvKiBmaWVsZCBpbiAvcHJvYy9zZWxmL3N0YXQJCQkqLwogCisvKiBVc2luZyBf
X2xpYmNfc3RhY2tfZW5kIGlzIG5vdCBwb3J0YWJsZSBiZXR3ZWVuIGRpZmZlcmVudCBnbGliYwor
ICAgdmVyc2lvbnMsIHNvIGRvbid0IHVzZSBpdC4gKi8KKyMgaWYgMAogIyBwcmFnbWEgd2VhayBf
X2xpYmNfc3RhY2tfZW5kCiAgIGV4dGVybiBwdHJfdCBfX2xpYmNfc3RhY2tfZW5kOworIyBlbmRp
ZgogCiAjIGlmZGVmIElBNjQKICAgICAvKiBUcnkgdG8gcmVhZCB0aGUgYmFja2luZyBzdG9yZSBi
YXNlIGZyb20gL3Byb2Mvc2VsZi9tYXBzLgkqLwpAQCAtOTAyLDYgKzkwNiw3IEBACiAgICAgd29y
ZCByZXN1bHQgPSAwOwogICAgIHNpemVfdCBpLCBidWZfb2Zmc2V0ID0gMDsKIAorIyAgIGlmIDAK
ICAgICAvKiBGaXJzdCB0cnkgdGhlIGVhc3kgd2F5LiAgVGhpcyBzaG91bGQgd29yayBmb3IgZ2xp
YmMgMi4yCSovCiAgICAgICBpZiAoMCAhPSAmX19saWJjX3N0YWNrX2VuZCkgewogIyAgICAgICBp
ZmRlZiBJQTY0CkBAIC05MTUsNiArOTIwLDggQEAKIAkgIHJldHVybiBfX2xpYmNfc3RhY2tfZW5k
OwogIwllbmRpZgogICAgICAgfQorIyAgIGVuZGlmCisKICAgICBmID0gb3BlbigiL3Byb2Mvc2Vs
Zi9zdGF0IiwgT19SRE9OTFkpOwogICAgIGlmIChmIDwgMCB8fCBTVEFUX1JFQUQoZiwgc3RhdF9i
dWYsIFNUQVRfQlVGX1NJWkUpIDwgMiAqIFNUQVRfU0tJUCkgewogCUFCT1JUKCJDb3VsZG4ndCBy
ZWFkIC9wcm9jL3NlbGYvc3RhdCIpOwpkaWZmIC1yIC11IGdjNi4yL3B0aHJlYWRfc3VwcG9ydC5j
IGdjNi4yX2NoYW5nZXMvcHRocmVhZF9zdXBwb3J0LmMKLS0tIGdjNi4yL3B0aHJlYWRfc3VwcG9y
dC5jCVdlZCBKdW4gMTggMTk6MTA6MzggMjAwMworKysgZ2M2LjJfY2hhbmdlcy9wdGhyZWFkX3N1
cHBvcnQuYwlUaHUgSnVsIDMxIDE3OjMyOjIxIDIwMDMKQEAgLTk5LDYgKzk5LDcgQEAKICMgaW5j
bHVkZSA8c3lzL3R5cGVzLmg+CiAjIGluY2x1ZGUgPHN5cy9zdGF0Lmg+CiAjIGluY2x1ZGUgPGZj
bnRsLmg+CisjIGluY2x1ZGUgPHNpZ25hbC5oPgogCiAjaWYgZGVmaW5lZChHQ19EQVJXSU5fVEhS
RUFEUykKICMgaW5jbHVkZSAicHJpdmF0ZS9kYXJ3aW5fc2VtYXBob3JlLmgiCkBAIC0xMjQwLDE2
ICsxMjQxLDE5IEBACiAKICAgICByZXN1bHQgPSBSRUFMX0ZVTkMocHRocmVhZF9jcmVhdGUpKG5l
d190aHJlYWQsIGF0dHIsIEdDX3N0YXJ0X3JvdXRpbmUsIHNpKTsKIAorICAgIGlmKHJlc3VsdCA9
PSAwKQorICAgICAgewogIyAgIGlmZGVmIERFQlVHX1RIUkVBRFMKICAgICAgICAgR0NfcHJpbnRm
MSgiU3RhcnRlZCB0aHJlYWQgMHglWFxuIiwgKm5ld190aHJlYWQpOwogIyAgIGVuZGlmCi0gICAg
LyogV2FpdCB1bnRpbCBjaGlsZCBoYXMgYmVlbiBhZGRlZCB0byB0aGUgdGhyZWFkIHRhYmxlLgkJ
Ki8KLSAgICAvKiBUaGlzIGFsc28gZW5zdXJlcyB0aGF0IHdlIGhvbGQgb250byBzaSB1bnRpbCB0
aGUgY2hpbGQgaXMgZG9uZQkqLwotICAgIC8qIHdpdGggaXQuICBUaHVzIGl0IGRvZXNuJ3QgbWF0
dGVyIHdoZXRoZXIgaXQgaXMgb3RoZXJ3aXNlCQkqLwotICAgIC8qIHZpc2libGUgdG8gdGhlIGNv
bGxlY3Rvci4JCQkJCSovCi0gICAgd2hpbGUgKDAgIT0gc2VtX3dhaXQoJihzaSAtPiByZWdpc3Rl
cmVkKSkpIHsKLSAgICAgICAgaWYgKEVJTlRSICE9IGVycm5vKSBBQk9SVCgic2VtX3dhaXQgZmFp
bGVkIik7Ci0gICAgfQorCS8qIFdhaXQgdW50aWwgY2hpbGQgaGFzIGJlZW4gYWRkZWQgdG8gdGhl
IHRocmVhZCB0YWJsZS4JCSovCisJLyogVGhpcyBhbHNvIGVuc3VyZXMgdGhhdCB3ZSBob2xkIG9u
dG8gc2kgdW50aWwgdGhlIGNoaWxkIGlzIGRvbmUJKi8KKwkvKiB3aXRoIGl0LiAgVGh1cyBpdCBk
b2Vzbid0IG1hdHRlciB3aGV0aGVyIGl0IGlzIG90aGVyd2lzZQkJKi8KKwkvKiB2aXNpYmxlIHRv
IHRoZSBjb2xsZWN0b3IuCQkJCQkqLworCXdoaWxlICgwICE9IHNlbV93YWl0KCYoc2kgLT4gcmVn
aXN0ZXJlZCkpKSB7CisJICBpZiAoRUlOVFIgIT0gZXJybm8pIEFCT1JUKCJzZW1fd2FpdCBmYWls
ZWQiKTsKKwl9CisgICAgICB9CiAgICAgc2VtX2Rlc3Ryb3koJihzaSAtPiByZWdpc3RlcmVkKSk7
CiAJTE9DSygpOwogCUdDX0lOVEVSTkFMX0ZSRUUoc2kpOwo=


---MOQ106805279966af2f7fc01369b1d582d694f91766fc--