Access violation in Python garbage collector (visit_decref) - how to debug?
Geoff Bache <geoff.bache at gmail.com> writes:
> Yes, this is hard, that's why I'm here :)
> I've enabled the equivalent tools to valgrind in Visual Studio, and tried
> setting PYTHONMALLOC=debug, but neither of those seem to be showing
> anything either. I don't really know what else to try in this direction.
It likely is too hard to be solved remotely (in this list).
When I had analysed a similar problem (a long time ago),
I had the chance that the problem was quite easily reproducible --
is this the case for you?
This allowed me to run the program under debugger ("gdb") control
with the debugger getting control as soon as the access violation
occured. The "gdb" allows debugging both at C as well as at
machine code level. Using machine code level debugging allowed
me to determine the address where the wrong pointer came from.
Setting a (hardware) watchpoint for this address stopped the
program when the wrong pointer was written to this address.
This gave me two important pieces of information: which code
writes the wrong pointer and what was in the memory region before the memory
corruption (which type of Python object was involved).
In my case, reproducibility, machine level debugging, hardware write
watchpoints and a detailed knowledge of Python's runtime data structures have
been necessary to resolve the problem.
> On Sat, Oct 5, 2019 at 7:22 AM dieter <dieter at handshake.de> wrote:
>> Geoff Bache <geoff.bache at gmail.com> writes:
>> > ...
>> > We are running Python embedded in our C++ product and are now
>> > crashes (access violation reading 0xffffffffff on Windows) in the Python
>> > garbage collector.
>> Errors like this are very difficult to analyse. The main reason:
>> the memory corruption is likely far away from the point when
>> it is finally detected (by an access violation in your case).
>> Python can be built in a special way to add marks to
>> its allocated memory blocks and verify their validity.
>> This increases the chance to detect a memory corruption earlier
>> and thereby facilitates the detection of the root cause.
>> There are tools for the analysis of memory management problems
>> (e.g. "valgrind", though this may be for Linux). In my
>> experience, even with those tools, the analysis is very difficult.
>> I have several times successfully analysed memory corruption
>> problems. In those cases, I have been lucky that the corruption
>> was reproducible and affected typically the same address.
>> Thus, I could put a (hardware) memory breakpoint at this address
>> stopping the program as soon as this address was written and
>> then analyse the state in the debugger. This way, I could detect
>> precisely which code caused the corruption. However,
>> this was quite a long time ago; nowadays, modern operating systems
>> employ address randomization thus reducing significantly that
>> the corruption affects the same address (which may mean that
>> you need to deactivate address randomization to get a better chance
>> for this kind of analysis.