feat(moq-relay): graceful shutdown via GOAWAY drain#1628
Open
kixelated wants to merge 4 commits into
Open
Conversation
The first shutdown signal (Ctrl+C / SIGINT, or SIGTERM from `systemctl stop`) now stops accepting new connections, sends a GOAWAY to every active session, and waits for them all to drain. A second signal forces an immediate shutdown. This lets an operator roll a relay node without dropping in-flight media. moq-net gains the plumbing to actually send GOAWAY (previously the message types existed but were only received): - `Session::goaway(uri)` flips a per-session watch signal without closing the session, so in-flight groups can finish before the peer migrates away. - moq-lite (04+) opens a dedicated control stream and writes GOAWAY. - IETF moq-transport sends GOAWAY on the control stream for draft-14-16 (via the control-stream adapter) and on the SETUP uni stream for draft-17+. moq-native's `Server::accept` Ctrl+C handler (which hard-closes the endpoints) is now opt-out via `with_signal_handler(false)`; the relay disables it and drives signals itself. Other consumers keep the previous behavior by default. Tested end-to-end over WebTransport with moq-transport-14, where receiving a GOAWAY closes the peer session, so the drain is observable on both sides. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Open the SIGINT/SIGTERM streams up front in a `ShutdownSignals` helper and recv() twice, instead of re-registering a fresh listener for each wait. This closes the small window where a second signal could arrive between the first firing and the new listener being registered, and makes the soft (drain) vs hard (force) mapping explicit: SIGINT/SIGTERM drains, a second forces, and SIGKILL stays the kernel's uncatchable backstop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
kixelated
commented
Jun 10, 2026
Reworks the session GOAWAY surface per review feedback: - Replace `Session::goaway(&str)` with a public `Drain` handle: `session.drain()` returns a `Drain` whose `start(uri)` sends the GOAWAY and `complete().await` waits for the peer to leave. This combines "send GOAWAY" and "await drain" into one type instead of two loose methods. - `Drain::start` takes `impl Into<Option<&str>>`, matching the crate convention for optional args (`None` to just drain, `Some(uri)` to redirect). - Back the trigger with `kio::Producer`/`Consumer` instead of `tokio::sync::watch`, consistent with the rest of moq-net's async state. moq-native: rename `with_signal_handler` to `with_ctrl_c_handler`, and replace the `select!` `if`-guard on the built-in Ctrl+C arm with an explicit ctrl_c-or-pending future so the behavior doesn't depend on guard-evaluation timing. moq-relay/test: use `session.drain().start(None)` / `drain.complete().await`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-d9673b # Conflicts: # rs/moq-net/src/session.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds graceful shutdown to
moq-relay. The first shutdown signal drains connections; a second forces termination.SIGINT, orSIGTERMfromsystemctl stop): stop accepting new connections, send aGOAWAYto every active session, and wait for them all to close on their own. The relay keeps serving so in-flight groups can finish. On systemd it also reportsSTOPPING=1.This lets an operator roll a relay node without dropping in-flight media: clients that honor
GOAWAYmigrate to another relay (or re-resolve DNS behind a load balancer) before the session closes.What changed
rs/moq-net(the GOAWAY send path). The GOAWAY message types already existed but were only ever received. This adds sending:session.drain()returns a publicDrainhandle:start(uri)sends the GOAWAY (without closing the session),complete().awaitwaits for the peer to leave.starttakesimpl Into<Option<&str>>(Noneto just drain). Backed bykio::Producer/Consumer, nottokio::sync::watch.GOAWAY. Older lite drafts simply never send it (drain still works by waiting for the peer to leave, or a forced shutdown).GOAWAYon the shared control stream for draft-14-16 (via the control-stream adapter) and on the SETUP uni stream for draft-17+.The signal is plumbed into
lite::start/ietf::start; the trigger lives on the returnedSession.rs/moq-native.Server::accept's built-in Ctrl+C handler (which hard-closes the QUIC endpoints) is now opt-out viaServer::with_signal_handler(false). Default behavior is unchanged for existing consumers (moq-cli,moq-ffi); the relay disables it and drives signals itself.rs/moq-relay. Two-stage signal handling inmain, plus connection tracking inserve(an mpsc-sender refcount) so we can wait for every in-flight connection to drain. Each connection task watches the drain signal and drains its session (session.drain()) when it fires.Test plan
goaway_drains_peer_moq_transport_14(rs/moq-native/tests/broadcast.rs): server firessession.drain().start(None), both sides observe the session close. Uses draft-14 because receiving aGOAWAYthere closes the session, making the drain observable.Goawayencode/decode roundtrip unit tests (rs/moq-net/src/lite/goaway.rs).broadcastintegration tests pass (every lite + transport draft) — no handshake regression from the added plumbing.cargo clippyclean andcargo fmt --checkclean (pinned nix toolchain) for the three crates.Cross-package sync / out of scope
doc/bin/relay/index.mddocuments the new shutdown behavior.js/net: not touched. This PR is the server-side send path. Browser clients honoring an incomingGOAWAY(migrating away) is a separate follow-up.GOAWAY. Cluster peers detect departure on their own.Targeting
devper the branch-targeting rules (changes underrs/moq-net).(Written by Claude)