Mysterious Memory Corruption Error

Hello all,

I am fairly inexperienced with programming for vex or pros, and I am currently undergoing a mysterious memory corruption (that’s what I suspect) error with my code.

Do not mind the goto’s and such. This is transpiled output from a compiler I wrote (I am 99% it works, I’ve tested it extensively on my computer). Here is the file main.c - Google Drive.

Aight so lemme explain the symptoms. At first, when I ran it, I encountered a data abort exception, and through addr2line I’ve managed to trace it back to these two locations:

main.c:442
main.c:444

These locations both being in gc_clean, I thought that this might be a garbage collector bug, but that is very unlikely however possible given the current information at the time - however what happens afterwards strongly indicates it’s some other memory corruption bug.

When I add more log statements, a prefetch abort occurs (00000) after competition_init exits but before autonomous is called. I found this extremley odd.

I thought that autonomous wasn’t being called so I added more log statements for each label so I could crudely track the high level control flow of the program. Welp, here’s the thing: the prefetch instruction completely disappeared.

I have the following theory - somewhere the physical instructions (the stuff in the text segment) was being corrupted (albeit at the same location). This makes sense considering the behavior changed when I inserted more log statements (which would offset the previously affected memory downwards and corrupt another portion). However, I am unable to test it out as I lack the experience with debugging vex, and I lack the software facilities (like JINX, haven’t even been able to find any clear/complete documentation) to do so.

Therefore I humbly beseech you for your help.

what in gods name are you doing

8 Likes

how is this real wtf

6 Likes

in your gc_clean function, there is this block of code:

for (uint_fast16_t i = trace_frame_bounds[heap_frame]; i < trace_count; i++)
    if (heap_traces[i]->gc_flag) {
        heap_traces[i]->gc_flag = 0;
        supertrace(heap_traces[i]);
    }
    else
        trace(heap_traces[i], reset_stack, &reseted_heap_count);

Are you sure there isn’t meant to be curly braces around the indented block? The data abort is probably coming from i being greater than the length of heap_traces. It’s definitely coming from something in your code trying to access non-allocated memory.

The weird behavior you’re getting when you add print statements, including the prefetch errors are probably due to race conditions, especially since you seem to initialize your runtime, then free it all during every function called by the PROS kernel. This is bad because all of these functions except initialize are run as separate tasks, and can be started and killed at any time, depending on field control. Your runtime should realistically be called during initialize if anything, or even better during global ctor.

This is all speculation though given I’m struggling to get my head around what’s an uncommented runtime combined with transpiled code. :upside_down_face:

5 Likes

Thanks for your reply. I do not believe that it is because of i being greater than heap_traces elemnt size, because i is less than trace count, which is supposed to be less than the size of heap_traces.

However, I think your theory regarding the pros functions being all called at once might actually make sense. It would explain the data corruption. However, to test this theory I replaced the old pros functions with:

static volatile int running = 0;
void initialize() {
	if (running) {
		puts("threading bug");
		return;
	}
	running = 1;
	puts("initialize");
	op_mode = OP_MODE_INIT;
	lcd_initialize();
	install_stdlib();
	init_all();
	if(!run()) {
		robot_log_cstr("An error occured in initialize:"); robot_log_cstr(error_names[last_err]);
	}
	free_runtime();
	running = 0;
}

void autonomous() {
	if (running) {
		puts("threading bug");
		return;
	}
	running = 1;
	puts("entered auton pls print!");
	op_mode = OP_MODE_AUTON;
	puts("Auton init");
	init_all();
	puts("Auton run");
	if(!run()) {
		robot_log_cstr("An error occured in autonomous:"); robot_log_cstr(error_names[last_err]);
	}
	free_runtime();
	running = 0;
}

void disabled() {
	if (running) {
		puts("threading bug");
		return;
	}
	running = 1;
	puts("disabled?");
	op_mode = OP_MODE_DISABLED;
	init_all();
	if(!run()) {
		robot_log_cstr("An error occured in disabled:"); robot_log_cstr(error_names[last_err]);
	}
	free_runtime();
	running = 0;
}

void competition_initialize() {
	if (running) {
		puts("threading bug");
		return;
	}
	running = 1;
	puts("config?");
	op_mode = OP_MODE_COMP_INIT;
	init_all();
	if(!run()) {
		robot_log_cstr("An error occured in competition_initialize:"); robot_log_cstr(error_names[last_err]);
	}
	free_runtime();
	puts("comp init exits");
	running = 0;
}

void opcontrol() {
	if (running) {
		puts("threading bug");
		return;
	}
	running = 1;
	puts("opcontrol?");
	op_mode = OP_MODE_OP_CONTROL;
	init_all();
	if(!run()) {
		robot_log_cstr("An error occured in opcontrol:"); robot_log_cstr(error_names[last_err]);
	}
	free_runtime();
	running = 0;
}

I added a crude check to verify that PROS was calling functions at the same time. However, to my great confusion, nothing changed whatsoever - no “threading bugs” were printed.

Thanks for taking the time to reply.

I am officially the biggest moron ever.

I modified some of the ffi functions to return “dummy” values so I can “simulate” the code without using actual robot hardware. And boy-oh-boy am I wrong (how did it even work in the first place is beyond me), a segfault occurs because the runtime isn’t freed properly.

Initially I modified the transpiler to only initialize the foreign function table in the very start of the program (in initialize) because it should be immutable. However I didn’t modify free_runtime() to not free the ffi table. Hence a double free occurs. The prefetch error comes ultimately from reading a freed ffi_table, which probably had garbage function pointer values.

Ultimately this begs the question is their a final function for cleanup at the end of the program for freeing resources. Is it disabled()?

Anyways, I’ll test my theory in a little while.

Edit: This appears to be the solution.

5 Likes

I’m honestly curious on your program style.
Is there any reason why you have a main.h file?

1 Like

There’s technically no real need to free resources at the end of program execution, since all heap memory allocations get forgotten anyways. As such you can’t leak memory in between program runs. disabled is called multiple times during a game, including between auton and driver control, so not the best time to do clean up.

3 Likes

That sucks - what if I had something like a file handle?

I’m just copying pros example project files, and pros says to include “main.h”.

so I think that the data corruption error is actually just throwing an exception. In other words, you’re accessing somewhere in memory where you’re not allowed to go. (At least I think that’s what throwing an exception means, VS lies sometimes though) Even though you already solved it, just for future understanding of future readers.

For example, i was using a 2d array once to store some variables. If I made the 2d array too large, or if I went above it’s allowed indices, I’d get the memory corruption error myself. i forgot what exactly I did to fix it because it was almost 4 months ago but for any people in the future, yeah it’s definitely a problem you can solve. jsut make sure everything you’re doing in your code is accessing where it should be. if there’s still a problem then idk I guess post it on the forums

How did this get flagged as spam? i was giving future advice to people looking up this EXACT problem. if it’s an error I would think it would be smart to address all parts of the error.

Any changes that haven’t been written before the program was ended wouldn’t get flushed. The only way to get a clean exit is for the program itself to end, and afaik you can’t get a clean exit if VexOS decides to end your program (user presses the stop button).

1 Like

I’ve already resolved the issue and the cause is nothing like how you’ve described, despite semantic similarities.

If you read my previous message that I’ve marked as the solution, the issue stemmed from what in essence was reading from freed memory, NOT FROM READING OUTSIDE A MEMORY BLOCK.

1 Like

well yes, but it’s memory you aren’t supposed to access still. As far as my understanding of dynamic memory goes (I never use it). Plus vex gave me the same error for that problem I described. I should’ve stated that I meant for future reference for READERS, as that is what I meant, since I distinctly remember looking for this error on the forums and not knowing any solution until my computer science teacher pitched in. I’ll edit the post to add that in