Skip to content

Prevent crashes at exit#1287

Open
KowalskiThomas wants to merge 1 commit intoapache:trunkfrom
KowalskiThomas:kowalski/prevent-crash-at-exit
Open

Prevent crashes at exit#1287
KowalskiThomas wants to merge 1 commit intoapache:trunkfrom
KowalskiThomas:kowalski/prevent-crash-at-exit

Conversation

@KowalskiThomas
Copy link
Copy Markdown

What is this PR?

This PR attempts to fix a bug we are seeing in our services and CI and that we have identified as coming from the Python Cassandra driver (cassandra-python-driver is the only Python package in our environments that has libev as a dependency).

Stack traces typically look like the following, and crashes happen around exit.

Error UnixSignal: Process terminated with SEGV_MAPERR (SIGSEGV)
#0   0x00006033d6136710 new_threadstate (/usr/src/python/Python/pystate.c:1388:1)
#1   0x00006033d600e331 PyGILState_Ensure (/usr/src/python/Python/pystate.c:2222:16)
#2   0x00007b014dc04ec3 ev_invoke_pending
#3   0x00007b014dc087a5 ev_run
#4   0x00006033d6072b30 method_vectorcall_NOARGS (/usr/src/python/Objects/descrobject.c:454:24)
#5   0x00006033d6070738 _PyObject_VectorcallTstate (/usr/src/python/./Include/internal/pycore_call.h:92:11)
#6   0x00006033d6070738 PyObject_Vectorcall (/usr/src/python/Objects/call.c:325:12)
#7   0x00006033d6097406 _PyEval_EvalFrameDefault (/usr/src/python/Python/bytecodes.c:2715:19)
#8   0x00006033d6071716 _PyObject_VectorcallTstate (/usr/src/python/./Include/internal/pycore_call.h:92:11)
#9   0x00006033d6071716 method_vectorcall (/usr/src/python/Objects/classobject.c:69:20)
#10  0x00006033d609acf8 _PyEval_EvalFrameDefault (/usr/src/python/Python/bytecodes.c:3263:26)
#11  0x00006033d6071716 _PyObject_VectorcallTstate (/usr/src/python/./Include/internal/pycore_call.h:92:11)
#12  0x00006033d6071716 method_vectorcall (/usr/src/python/Objects/classobject.c:69:20)
#13  0x00006033d615dd19 thread_run (/usr/src/python/./Modules/_threadmodule.c:1114:21)
#14  0x00006033d613d238 pythread_wrapper (/usr/src/python/Python/thread_pthread.h:237:5)

This looks a lot like a problem we have identified in our own https://github.com/DataDog/dd-trace-py repository, where we need to make sure the interpreter hasn't finalised before trying to acquire the GIL -- otherwise it's not legal to do so and it typically just crashes.

However I think in the case of this code, this may actually be caused by another issue: the current atexit hook is invalid. libevreactor.py does this:

_global_loop = None
atexit.register(partial(_cleanup, _global_loop))

This "generates" the atexit hook function when it is called so with the current _global_loop value as opposed to evaluating _global_loop at exit.
As an illustration, take this:

from functools import partial
import atexit

the_global = None
atexit.register(partial(print, the_global))

the_global = "something"

Running this prints None and not something.
The right way to do this (if we want to keep it at the top level) would be to use a lambda:

import atexit

the_global = None
atexit.register(lambda: print(the_global))

the_global = "something"

... which does print something at exit.

My proposed fix is thus in two parts:

  • Change the atexit hook registration from partial to a lambda
  • On top of this, make sure to never try to acquire the GIL if the interpreter has already finalised (and early-return instead). Although in theory this is not necessary -- _cleanup asks the thread to stop so it shouldn't keep on trying to execute CPython code -- the fact that join is called with a timeout of 1 second means it can fail, in which case the existing code is not safe (this is why we need the same mechanism in ddtrace).

Let me know if that makes sense!


_global_loop = None
atexit.register(partial(_cleanup, _global_loop))
atexit.register(lambda: _cleanup(_global_loop))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lambdas are not of taste

Suggested change
atexit.register(lambda: _cleanup(_global_loop))
@atexit.register
def _():
global _global_loop
_cleanup(_global_loop)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants