Skip to content

Free-threading: signal.pthread_kill deadlocks child threads #150873

@crusaderky

Description

@crusaderky

Context

The Dask threaded scheduler (e.g. dask.array or dask.dataframe without dask.distributed) blocks the main thread when the user calls compute() and internally spawns a concurrent.futures.ThreadPoolExecutor to schedule tasks. If a user hits Ctrl+C, they expect the program to terminate soon (at most waiting for the tasks that are halfway through execution to terminate).

Bug report

This sequence frequently deadlocks on Linux x64 free-threaded:

  1. main thread calls threading.Thread.start
  2. another thread calls signal.pthread_kill(main_thread, signal.SIGINT), which lands after start() has returned but before the thread's target function has started executing
  3. main thread handles the resulting KeyboardInterrupt
  4. main thread calls .join() on the child thread started at point 1.
  5. frequently, .join() deadlocks. A second SIGINT (e.g. from Ctrl+C) unblocks it.

Reproducer

import signal
import threading

USE_BARRIER = False
NTHREADS = 20

def demo():
    main_thread = threading.get_ident()
    barrier = threading.Barrier(NTHREADS) if USE_BARRIER else None
    thread_can_die = threading.Event()

    def f() -> None:
        if barrier:
            barrier.wait()
        thread_can_die.wait()

    def interrupt() -> None:
        if barrier:
            barrier.wait()
        signal.pthread_kill(main_thread, signal.SIGINT)

    threads = []
    for _ in range(NTHREADS - 1):
        t = threading.Thread(target=f)
        t.start()
        threads.append(t)
    try:
        t = threading.Thread(target=interrupt)
        t.start()
        threads.append(t)
        for t in threads:
            t.join()
    except KeyboardInterrupt:
        pass
    else:
        assert False, "Expected KeyboardInterrupt"

    thread_can_die.set()
    for t in threads:
        t.join()


if __name__ == "__main__":
    i = 0
    while True:
        print(i, flush=True)
        demo()
        i += 1

On these setups, the above program hangs within a handful of cycles:

  • Linux x64 3.14.5t
  • Linux x64 git tip #7a468a101268d2b13105f94ae027df8b502d0c87 (built with Tools/pixi-packages/freethreading)

Cannot reproduce on

  • Linux x64 3.14.5

Setting USE_BARRIER=True, which ensures that all child threads are already fully started when the SIGINT lands on the main thread, makes the issue disappear.

CPython versions tested on:

CPython main branch, 3.14

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions