[Gc] Memory corruption on new GC port.

georges.duperon at cortus.com georges.duperon at cortus.com
Mon Jul 18 00:46:23 PDT 2011


Hello,

As part of my job, I am porting BoehmGC to a new 32-bit embedded platform,
and I am experiencing some memory corruption.

I have managed to reduce the test case that exhibits such corruption to
the following configuration:
* The Boehm Garbage Collector, version gc-7.2alpha6.
* A custom LibC, used for printf and the such. Output is discarded, but I
put a breakpoint on the write() function to see when GC_ASSERT() messages
are triggered.
* A custom sbrk() function that allocates memory in a region that is not
accessed by the LibC nor the main program. This function is used for the
collector's GET_MEM().
* A very simple main2 program, which is called by main with the address of
a local variable:
void main2(void* StackVariableAddress) {
	GC_stackbottom = StackVariableAddress;
	GC_INIT();
	int i;
	int j;
	for (i = 1; i < 1024; i++) {
		char* ptr = GC_MALLOC(i);
		if (!ptr) continue;
		for (j = 0; j < i; j++)
			ptr[j] = 1;
	}
}
* The GC port is as follows:
#define ALIGNMENT 4
#define DATASTART _sbss
#define DATAEND   __heap_bottom
#define DATASTART2 __noinit_start
#define DATAEND2   __noinit_end
void* my_sbrk(size_t nbytes);
#define GET_MEM(bytes) (struct hblk*)my_sbrk(bytes)

Between _sbss and __heap_bottom are the sections .bss, .common, .data,
.rodata and the .sXXX variants of those.
Between __noinit_start and __noinit_end is the .noinit section.
__heap_bottom and __heap_top delimit a large (256k) space declared in the
linker script.
After __heap_top is a 16k region reserved for the stack, which should be
plenty enough.
The bug still apears if I give 128k of stack (and 128k of heap, not 256k
since memory is limited).
The bug still apears if I remove DATASTART2 and DATAEND2.
The bug still apears if I add #define STACKBOTTOM __stack_bottom.
The bug still apears if I use the LibC's malloc() instead of my_sbrk().

I don't use incremental collection, no threads, no parallel marking,
single processor and no MMU.

Here are the symptoms (and origin?) of the memory corruption:
During the first collection *after* GC_INIT(), reclaim.c:GC_reclaim_block
is called on the hbp 0x3a800, of size 1024, which contains part of the
headers.c:hdr_free_list.
Afterwards, when this heap block is allocated, writes to that region
corrupt the headers.c:hdr_free_list.
Since we fill the regions we allocate with the byte 1, the last
uncorrupted pointer of headers.c:hdr_free_list points to the value
0x01010101.
When that value is dereferenced (to allocate a new hdr from the free
list), an "Alignment error" occurs in the function headers.c:alloc_hdr.

Currently __heap_bottom is at 0x26ab4. If it is at a different address
(eg. because of a new global variable), this bug can disapear, occur later
or earlier, and the corruption can affect other data structures.

Do you see any obvious problem in my port? Or is this a known problem (I
searched the list's archives but didn't find anything relevant).
Otherwise, do you have suggestions on places where in the GC code I could
check the validity of the GC's structures to find why it frees a block
which is still used for it's internal structures? Any pointers will be
appreciated.

Thank you in advance.
--
Georges Dupéron




More information about the Gc mailing list