Skip to content

PEP 788: Address feedback from first discussion round #4400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

ZeroIntensity
Copy link
Member

@ZeroIntensity ZeroIntensity commented May 3, 2025

  • Change is either:
    • To a Draft PEP
    • To an Accepted or Final PEP, with Steering Council approval
    • To fix an editorial issue (markup, typo, link, header, etc)
  • PR title prefixed with PEP number (e.g. PEP 123: Summary of changes)

The key changes I made here are:

  • Interpreter references now have a dedicated type (PyInterpreterRef and PyInterpreterWeakRef *).
  • Add PyThreadState_GetDaemon() and what needs to happen to threading for that to work.
  • Add a terminology section for clarification on what things like "finalization" refer to in the PEP.
  • Add a rejected idea for the previous approach (mainly using PyInterpreterState * for refs).
  • Add an open issue for PyThreadState_Ensure stealing a strong reference.
  • Editorial: Lots of clarity fixes (many had trouble understanding the PEP, unfortunately).
  • Editorial: Headings are now in title case ("The Apple is Weird and Confusing") rather than sentence case ("The apple is weird and confusing").

📚 Documentation preview 📚: https://pep-previews--4400.org.readthedocs.build/pep-0788/

@ZeroIntensity ZeroIntensity marked this pull request as ready for review May 4, 2025 17:55
@ZeroIntensity ZeroIntensity requested a review from vstinner as a code owner May 4, 2025 17:55
@ZeroIntensity
Copy link
Member Author

@godlygeek I can't add you as a reviewer, but I'd appreciate your opinion on some of the new things I'm proposing here.

@ZeroIntensity ZeroIntensity requested a review from AA-Turner May 4, 2025 20:02
@ZeroIntensity
Copy link
Member Author

Bump @AA-Turner @vstinner now that the beta is out. There's no rush considering we have a year until the 3.15 freeze, though.

ZeroIntensity and others added 3 commits May 9, 2025 07:21
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member

vstinner commented May 9, 2025

I'm not comfortable with PyInterpreterRef and PyInterpreterWeakRef which are different: PyInterpreterRef is used as a scalar value, whereas PyInterpreterWeakRef is used as a pointer. I would prefer that both are scalars, or both are pointers. Maybe using a pointer API would be more future proof and it makes sure that the value (the pointer) can be stored ... in a pointer, for a callback argument.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, the API looks better to me.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does PyThreadState_Ensure() create a new thread state instead of reusing an existing one?

deallocated, and shutdown for the main interpreter includes the entire Python
runtime being finalized.

Native and Python Threads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Native threads" is not a good term for non-threading module created threads even if you (re)define the term. It perpetuates a confusion that Python threads aren't "real" OS threads.

We use "non-Python created threads" in, e.g., https://docs.python.org/3/c-api/init.html#non-python-created-threads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't totally disagree, but I'd really prefer not using the clunky phrase "non-Python created thread", especially considering how many times "native thread" is used throughout the PEP. Is there a shorter term we could use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OS threads?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "OS thread" has the same problem as "native thread". I'm personally fine with still using the term "native thread", because this PEP will change how threading works too. So you get two interpretations, both being correct:

  • "Reimagining Threads natively": The PEP changes how to use OS threads with a native caller.
  • "Reimagining Native Threads": The PEP changes how native OS threads, threading included, interact with the interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to avoid "native threads", just write "non-Python threads"?

like this:

.. code-block:: c
In Python, threads are able to interact with an interpreter (e.g., invoke the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this paragraph is helpful here. Get to the crux: PyGILState_Ensure/PyGILState_Release is one of the primary ways to get a valid thread state using the C API and it doesn't work well with subinterpreters because it always creates (or reuses) a thread state for the main interpreter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful as a quick intro to people who aren't deeply familiar with how thread states work. I would expect that 99% of extension developers just know the PyGILState surface API, so the thread state terminology that the proposal uses might seem foreign. Would you prefer I just link to the documentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think talking about terminology is okay, but it doesn't need to be at the beginning of the abstract. When reading the abstract, I'd want to see:

  1. The most important problem the PEP is solving
  2. An outline of the solution without going in to too much detail

called. There's a subtle difference between the two terms, as used in this
PEP:

- "Finalization" refers to an interpreter getting ready to "shut down", in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the distinction being made here between finalization and shutdown nor the purpose of making this distinction -- I don't see how "interpreter finalization" and "interpreter shutdown" are different things.

"Finalization" is also pretty overloaded; you can say "interpreter finalization" to be more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not important for the main interpreter, but there's an important distinction for subinterpreters because the PyInterpreterState structure becomes invalid after shutdown, but not during finalization. That's an important detail for the rationale on why we're not using PyInterpreterState * for PyThreadState_Ensure.

But I think you're right, anyway. I'll remove this.

Comment on lines +207 to +211
On free-threaded builds, lock-ordering deadlocks are still possible
if thread A acquired the lock for object A and then object B, and then
another thread tried to acquire those locks in the reverse order. Free-threading
currently protects against this by releasing locks when the thread state is
detached, making detachment a necessity to prevent deadlocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is mixing up a few ideas (critical sections and stop the world). You want Py_BEGIN_ALLOW_THREADS around blocking operations to prevent deadlock with GC (or other stop-the-world) events.

Stop-the-world events are analogous to holding the GIL -- one thread has exclusive access to the interpreter -- so you can end up with the same deadlocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is the current thing I wrote wrong? I think this paragraph is quite long already, and it's only here to quickly explain why Py_BEGIN_ALLOW_THREADS is still needed on free-threaded builds. If there's nothing wrong with what's already there, it's probably fine to leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's getting long already. Was this expanded in response to feedback on the PEP? I would just have one sentence regarding the free threaded build at the end of the lock ordering deadlock paragraph. Something like:

...while thread B holds the GIL and is waiting on the lock. A similar deadlock can occur in the free threaded build during stop the world pauses, which happen during during garbage collection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is the current thing I wrote wrong?

The first part is definitely true: "on free-threaded builds, lock-ordering deadlocks are still possible".

The subsequent explanation doesn't make much sense to me. The "releasing locks when the thread state is
detached" behavior is only applicable to critical sections, and we don't use Py_BEGIN_ALLOW_THREADS directly with critical section APIs. The critical section API implementations do the detaching/reattaching internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this expanded in response to feedback on the PEP? I would just have one sentence regarding the free threaded build at the end of the lock ordering deadlock paragraph.

Ok, yeah, I'll do it this way.

The critical section API implementations do the detaching/reattaching internally.

I was thinking of something like this:

Py_BEGIN_CRITICAL_SECTION(whatever);
acquire_os_lock(); // NOT PyMutex
Py_END_CRITICAL_SECTION();

You still need to have Py_BEGIN_ALLOW_THREADS to release that critical section. Otherwise you can get lock-ordering deadlocks, right?

Daemon threads can cause finalization deadlocks
***********************************************
Daemon Threads Can Deadlock Finalization
****************************************
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario here that leads to deadlock here happens when you acquire a lock during an object destructor/finalizer thats also acquired elsewhere. But if you're doing that, non-daemon threads won't save you because you will still occasionally deadlock with the GC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wasn't aware of this. That sounds like it compromises some of the motivation behind the PEP. Could you go into a little more detail here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's GC runs object finalizers (i.e., __del__ methods) in the same thread that triggered the GC, so you can end trying to reacquire a lock you already hold.

Here's an example from stackoverflow: https://stackoverflow.com/questions/18774401/self-deadlock-due-to-garbage-collector-in-single-threaded-code

Java runs finalizers in their own thread to avoid this sort of problem.

Generally, the "fix" in Python is to avoid acquiring locks in __del__ functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense, I wasn't taking the garbage collector into account. I was imagining some C code that acquired a lock and got hung before calling into Python. But I think you're right that PyThreadState_Ensure isn't sufficient to fix tp_finalize/__del__ deadlocks. That's not good.

Java runs finalizers in their own thread to avoid this sort of problem.

Thinking out loud--it's not out of the question to take a similar approach for Python, at least for non-daemon threads. I'm not too sure what gc.garbage does in modern versions, but in theory, we could put objects collected under a non-daemon thread state into a garbage list, and then finalize them all in a dedicated thread.

Comment on lines 937 to 938
no closure, it's not possible for the caller to pass any objects or
interpreter-specific data, so it's completely safe to choose the main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how "it's completely safe to choose the main interpreter" follows from "the callback has no closure".

Maybe these use cases (like readline) only make sense in the main interpreter, but choosing the correct interpreter seems important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have a closure argument/callback parameter or some other workaround, then you can't pass interpreter-specific data, so you're good to assume that the main interpreter is the one you want. It might not be the right choice, but by "totally safe", I mean that things will just raise a completely safe exception (e.g. an AttributeError), rather than crashing through using an object from a different interpreter.

So basically, if you can access state from the caller, then you can also store the interpreter reference, and if not, then there's no way to access invalid cross-interpreter state anyway.

@ZeroIntensity
Copy link
Member Author

ZeroIntensity commented May 13, 2025

When does PyThreadState_Ensure() create a new thread state instead of reusing an existing one?

The same places where PyGILState_Ensure would (via the gilstate pointer), in addition to when the current interpreter doesn't match the requested interpreter. Should that be specified, or left up the implementation? I could see it going either way.

@colesbury
Copy link
Contributor

Yes, I think it's worth specifying. The constraint "when the current interpreter doesn't match the requested interpreter" makes me think that you can end up with more threads states per interpreter+OS thread.

@ZeroIntensity
Copy link
Member Author

ZeroIntensity commented May 13, 2025

Here's what I was envisioning:

// No tstate ever in this thread
// current: NULL

PyThreadState_Ensure(main_interp); // New thread state for the main interpreter
// current: main
PyThreadState_Ensure(main_interp); // Do nothing
// current: main
Py_BEGIN_ALLOW_THREADS; // Detach it
// current: NULL
PyThreadState_Ensure(main_interp); // Reattach the old thread state
// current: main
PyThreadState_Release(); // Detach it again
// current: NULL
Py_END_ALLOW_THREADS; // Reattach
// current: main
PyThreadState_Ensure(subinterp); // New thread state for the subinterpreter
// current: subinterp
PyThreadState_Ensure(main_interp); // New thread state for the main interpreter
// current: main

Basically, PyThreadState_Ensure for a different interpreter treats it as a new thread.

@vstinner
Copy link
Member

cc @pablogsal: the PEP update I told you about. You might want to have a look.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I haven't gotten though the whole proposal, but did start by trying to understand what we're trying to solve here. I've left short notes for each of the motivation sections. Ultimately, I think are some bugs to be addressed separately and the other concerns could be solved without replacing the existing API. That doesn't mean a new API isn't warranted, but let's see how much value it would add first. (I'd be glad to hop into a call to discuss this more.)


Motivation
==========

Native threads will always hang during finalization
---------------------------------------------------
Native Threads Always Hang During Finalization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we address this with a distinct return value and two new functions?

  • int PyGILState_Failed(PyGILState_STATE)
  • void PyGILState_SetHangDuringFini(int) # default 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to support a callback that gets called when the interpreter is finalizing:

typedef int (tstate_fini_callback *)(PyThreadState *, void *arg);
void PyThreadState_SetFiniCallback(PyThreadState *, tstate_fini_callback, void *arg);

I'm sure there are other options that are likewise compatible with the existing API (not that a replacement for PyGILState_Ensure() is necessarily a bad idea).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, at a certain point it make more sense to simply replace the old API, as you are proposing, rather than add a bunch of functions to awkwardly work around the deficiencies of the original API. That very well may be the case here. I just want to be sure we consider the obvious alternatives.

*addition* to calling Python).
code in their stream of calls.

Joining the Thread isn't Always Possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the previous one?

it). If the interpreter is finalizing or has shut down, then the thread is
hung, disrupting the C++ caller.

``Py_IsFinalizing`` is Insufficient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing the race here is certainly worth doing, but seems like a separate issue from the GILState API. It also doesn't seem like something that needs a PEP.


Daemon threads can cause finalization deadlocks
***********************************************
Daemon Threads Can Deadlock Finalization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is this an issue in practice?


.. _pep-788-hanging-compat:

We can't change finalization behavior for ``PyGILState_Ensure``
***************************************************************
Finalization Behavior for ``PyGILState_Ensure`` Cannot Change
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier comment about returning an error.


The existing APIs are broken and misleading
-------------------------------------------
The GIL-state APIs are Buggy and Confusing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is a bit unclear. Bugginess is definitely something to fix. :) I'm not sure there's a strong argument about it being "confusing" though. As to there being multiple APIs, that's not a bad thing. Essentially, PyGILState_Ensure() is a convenience wrapper around the other functions. This section doesn't seem to add much value.


``PyGILState_Ensure`` generally crashes during finalization
``PyGILState_Ensure`` Generally Crashes During Finalization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crashes here are bugs to be fixed, which I don't expect will need a PEP to solve.

here. Incidentally, acceptance and implementation of this PEP will likely fix
the existing crashes caused by :c:func:`PyGILState_Ensure`.

The term "GIL" is tricky for free-threading
The Term "GIL" is Tricky for Free-threading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the name can be a little confusing. That is more of a "while we're at it, let's use a better name" situation than a motivation for a PEP. I wouldn't make much of a point about it, except maybe to mention it in passing.

-----------------------------------------------------
.. _pep-788-subinterpreters-gilstate:

``PyGILState_Ensure`` Doesn't Guess the Correct Interpreter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be fixed with an extra function, e.g. int PyGILState_SetInterpreter(PyInterpreterState *), which I wouldn't think needs a PEP.


Interpreters can concurrently shut down
***************************************
Concurrent Interpreter Deallocation is Frustrating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugs with finalization and use-after-free are broader issues that should be handled separately from solutions just about the GILState API. As mentioned earlier, I think the question of improving the current hanging behavior can be handled in a way that doesn't need a PEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants