[Gc] linked lists

Boehm, Hans hans_boehm@hp.com
Mon, 10 Mar 2003 15:37:28 -0800


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C2E75E.034DD610
Content-Type: text/plain;
	charset="iso-8859-1"

It looks like there are two kinds of problems here:

1) You uncovered a bug with the combination of the gc_cleanup and debug allocation.  Thanks.  I attached a patch for gc_cpp.h.  This accounts for the spurious "non-base-pointer" messages.

2) Some misunderstandings:

a) The collector does not guarantee to reclaim every object that you believe to be inaccessible.  Very often some number of objects will still appear to be reachable to the collector because their address will still be located in machine registers or stack locations.  I'm not sure this was really an issue here.  But if you have a single list, there is no guarantee that the head pointer or all the intermediate pointers will really be gone when you call GC_gcollect.  I attached a variant of your example that demonstrates the function of the collector by allocating many shorter lists, so that a single preserved reference has limited effect.  On my Linux machine, all but 6 out of 1100 objects (probably everything but parts of one list) are finalized when the program is compiled with g++ -O2.

b) The collector runs finalizers (e.g. the gc_cleanup destructors) only on objects that are not referenced by other objects with cleanup actions.  This is necessary to ensure that the cleanup actions only see objects that haven't already been cleaned up.  (See e.g. the first paper listed in http://www.hpl.hp.com/personal/Hans_Boehm/pubs.html .  This is intentionally different from Java finalization.)  Thus each collection will only finalize at most one object in your example.  Again the attached variant of your test demonstrates this.  

If you need the destructors to e.g. close files, but not follow the next link, the simplest solution is to allocate the file descriptors in separate small objects, referenced by the objects in the chain.  Only those small objects will be finalizable (inherit from gc_cleanup), and there will be no dependencies between them.  Thus they cal all be finalized in the same cycle.

A similar solution works for the doubly-linked case.  The garbage collector is perfectly happy to reclaim the memory associated with cycles; but there is no safe order in which to run cleanup actions.  Hence it'll give you warnings instead.

It's important to remember that GC-based finalization is really very different from C++ destruction.  C++ destructors are meant to be a pervasively used syntactic convenience.  GC finalizers are intended for those rare cases in which user code needs access to the collectors reachability information.  The fact that this interfaces reuses destructors as finalizers is convenient, but confusing.

Nearly every class in non-garbage-collected C++ needs a destructor.  In garbage collected code, perhaps 1 in 100 or 1 in 1000 needs a finalizer.

Hans



> -----Original Message-----
> From: Francisco [mailto:tester@mailhouse.ods.org]
> Sent: Saturday, March 08, 2003 5:07 AM
> To: gc@napali.hpl.hp.com
> Subject: [Gc] linked lists
> 
> 
> Dear all,
> 
> First, many thanks to all who made possible this tool to be available 
> free and as an open-source package. If only I could help a 
> tiny bit ...
> 
> I've been trying to convert some C++ code to use GC but due 
> to my lack 
> of experience things seem not be going as well as I was expecting.
> 
> 
> For instance, for this toy program with a simply linked list:
> (I was using gc_cleanup since each node holds a file ptr 
> which needs to 
> be closed as the "owner" is finalized)
> 
> //-------------------------------------------------------------
> (Linux Slackware 8.1 with gcc 3.1.1, GC version: gc6.2alpha3)
> 
> $ g++ aa.cpp -oaa -fomit-frame-pointer -lgc
> //-------------------------------------------------------------
> 
> #define GC_DEBUG 1
> #include <gc_cpp.h>
> #include <iostream>
> 
> using namespace std;
> 
> class List : public gc_cleanup {
> 	public:
> 		int a;
> 		List *next;
> 		
> 		List(int i):next(0) { a = i; }
> 		void addNext(List *other) { next = other; }
> 		
> 		void printMe() {
> 			List *xx = this;
> 			while(xx) {cout << xx->a; xx=xx->next; }
> 		}
> 
> 		~List() { cout << "destroy: " << a << "\n"; }
> };
> 
> List *creator() {
> 	
> 	List *yy, *xx, *head;
> 	
> 	xx = head = new List(0);
> 	for (int i=1; i<=1000; i++) {
> 		yy = new List(i); xx->addNext(yy); xx = yy;
> 	}
> 	
> 	return head;
> };
> 
> void doit() {
> 
> 	List *head = creator();
> 	head->printMe();
> }; 	
> 
> int main() {
> 
> 	doit();
> 	GC_gcollect();
> }	
> 
> //--------------------------------------------------
> 
> But only the head of the list gets to be able to print its finalizing 
> message with a warning (is it?):
> "GC_register_finalizer_ignore_self called with 
> non-base-pointer 0x805bf50"
> 
> However if I don't "connect" the list using the "next" chain 
> of pointers 
> all the nodes are collected. (once again with the above warning)
> 
> If a double-linked list is tried, a message about a pointer cycle 
> (course there are cycles!) is produced and the collecting 
> process gives 
> up too after the first node.
> 
> Almost for sure I'm missing something but ... well ... I 
> can't find what 
> it is!
> 
> 
> 
> Many thanks for any help,
> 
> -Francisco
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Gc mailing list
> Gc@linux.hpl.hp.com
> http://linux.hpl.hp.com/cgi-bin/mailman/listinfo/gc
> 


------_=_NextPart_000_01C2E75E.034DD610
Content-Type: application/octet-stream;
	name="linked.cc"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="linked.cc"

//-------------------------------------------------------------=0A=
//(Linux Slackware 8.1 with gcc 3.1.1, GC version: gc6.2alpha3)=0A=
//=0A=
//$ g++ aa.cpp -oaa -fomit-frame-pointer -lgc=0A=
//-------------------------------------------------------------=0A=
=0A=
#define GC_DEBUG 1=0A=
#include <gc_cpp.h>=0A=
#include <iostream>=0A=
=0A=
using namespace std;=0A=
=0A=
class List : public gc_cleanup {=0A=
	public:=0A=
		int a;=0A=
		List *next;=0A=
		=0A=
		List(int i):next(0) { a =3D i; }=0A=
		void addNext(List *other) { next =3D other; }=0A=
		=0A=
		void printMe() {=0A=
			List *xx =3D this;=0A=
			while(xx) {cout << xx->a; xx=3Dxx->next; }=0A=
		}=0A=
=0A=
		~List() { cout << "destroy: " << a << "\n"; }=0A=
};=0A=
=0A=
List *creator() {=0A=
	=0A=
	List *yy, *xx, *head;=0A=
	=0A=
	xx =3D head =3D new List(0);=0A=
	for (int i=3D1; i<=3D10; i++) {=0A=
		yy =3D new List(i); xx->addNext(yy); xx =3D yy;=0A=
	}=0A=
	=0A=
	return head;=0A=
};=0A=
=0A=
void doit() {=0A=
=0A=
	List *head =3D creator();=0A=
	head->printMe();=0A=
}; 	=0A=
=0A=
int main() {=0A=
=0A=
   for (int i =3D 0; i < 100; ++i) {=0A=
	doit();=0A=
	GC_gcollect();=0A=
   }=0A=
   for (int i =3D 0; i < 10; ++i) {=0A=
	GC_gcollect();=0A=
   }=0A=
}	=0A=
=0A=
//--------------------------------------------------=0A=
=0A=

------_=_NextPart_000_01C2E75E.034DD610
Content-Type: application/octet-stream;
	name="gc_cpp.h.diff"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="gc_cpp.h.diff"

--- gc_cpp.h.orig	2003-03-10 15:01:42.000000000 -0800=0A=
+++ gc_cpp.h	2003-03-10 15:01:53.000000000 -0800=0A=
@@ -308,7 +308,7 @@=0A=
 =0A=
 =0A=
 inline gc_cleanup::~gc_cleanup() {=0A=
-    GC_REGISTER_FINALIZER_IGNORE_SELF( GC_base(this), 0, 0, 0, 0 =
);}=0A=
+    GC_register_finalizer_ignore_self( GC_base(this), 0, 0, 0, 0 =
);}=0A=
 =0A=
 inline void gc_cleanup::cleanup( void* obj, void* displ ) {=0A=
     ((gc_cleanup*) ((char*) obj + (ptrdiff_t) =
displ))->~gc_cleanup();}=0A=

------_=_NextPart_000_01C2E75E.034DD610--