Skip to content

feat(moq-relay): graceful shutdown via GOAWAY drain#1628

Open
kixelated wants to merge 4 commits into
devfrom
claude/vigilant-agnesi-d9673b
Open

feat(moq-relay): graceful shutdown via GOAWAY drain#1628
kixelated wants to merge 4 commits into
devfrom
claude/vigilant-agnesi-d9673b

Conversation

@kixelated

@kixelated kixelated commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds graceful shutdown to moq-relay. The first shutdown signal drains connections; a second forces termination.

  • First signal (Ctrl+C / SIGINT, or SIGTERM from systemctl stop): stop accepting new connections, send a GOAWAY to every active session, and wait for them all to close on their own. The relay keeps serving so in-flight groups can finish. On systemd it also reports STOPPING=1.
  • Second signal: force shutdown immediately, dropping every connection.

This lets an operator roll a relay node without dropping in-flight media: clients that honor GOAWAY migrate to another relay (or re-resolve DNS behind a load balancer) before the session closes.

What changed

rs/moq-net (the GOAWAY send path). The GOAWAY message types already existed but were only ever received. This adds sending:

  • session.drain() returns a public Drain handle: start(uri) sends the GOAWAY (without closing the session), complete().await waits for the peer to leave. start takes impl Into<Option<&str>> (None to just drain). Backed by kio::Producer/Consumer, not tokio::sync::watch.
  • moq-lite (04+) opens a dedicated control stream and writes GOAWAY. Older lite drafts simply never send it (drain still works by waiting for the peer to leave, or a forced shutdown).
  • IETF moq-transport sends GOAWAY on the shared control stream for draft-14-16 (via the control-stream adapter) and on the SETUP uni stream for draft-17+.

The signal is plumbed into lite::start / ietf::start; the trigger lives on the returned Session.

rs/moq-native. Server::accept's built-in Ctrl+C handler (which hard-closes the QUIC endpoints) is now opt-out via Server::with_signal_handler(false). Default behavior is unchanged for existing consumers (moq-cli, moq-ffi); the relay disables it and drives signals itself.

rs/moq-relay. Two-stage signal handling in main, plus connection tracking in serve (an mpsc-sender refcount) so we can wait for every in-flight connection to drain. Each connection task watches the drain signal and drains its session (session.drain()) when it fires.

Test plan

  • New end-to-end test goaway_drains_peer_moq_transport_14 (rs/moq-native/tests/broadcast.rs): server fires session.drain().start(None), both sides observe the session close. Uses draft-14 because receiving a GOAWAY there closes the session, making the drain observable.
  • Lite Goaway encode/decode roundtrip unit tests (rs/moq-net/src/lite/goaway.rs).
  • All 57 broadcast integration tests pass (every lite + transport draft) — no handshake regression from the added plumbing.
  • cargo clippy clean and cargo fmt --check clean (pinned nix toolchain) for the three crates.

Cross-package sync / out of scope

  • doc/bin/relay/index.md documents the new shutdown behavior.
  • js/net: not touched. This PR is the server-side send path. Browser clients honoring an incoming GOAWAY (migrating away) is a separate follow-up.
  • Cluster outbound connections (relay dialing other relays as a client) are not drained; only inbound/accepted sessions receive GOAWAY. Cluster peers detect departure on their own.

Targeting dev per the branch-targeting rules (changes under rs/moq-net).

(Written by Claude)

kixelated and others added 2 commits June 4, 2026 16:11
The first shutdown signal (Ctrl+C / SIGINT, or SIGTERM from `systemctl stop`)
now stops accepting new connections, sends a GOAWAY to every active session,
and waits for them all to drain. A second signal forces an immediate shutdown.
This lets an operator roll a relay node without dropping in-flight media.

moq-net gains the plumbing to actually send GOAWAY (previously the message types
existed but were only received):

- `Session::goaway(uri)` flips a per-session watch signal without closing the
  session, so in-flight groups can finish before the peer migrates away.
- moq-lite (04+) opens a dedicated control stream and writes GOAWAY.
- IETF moq-transport sends GOAWAY on the control stream for draft-14-16 (via the
  control-stream adapter) and on the SETUP uni stream for draft-17+.

moq-native's `Server::accept` Ctrl+C handler (which hard-closes the endpoints)
is now opt-out via `with_signal_handler(false)`; the relay disables it and drives
signals itself. Other consumers keep the previous behavior by default.

Tested end-to-end over WebTransport with moq-transport-14, where receiving a
GOAWAY closes the peer session, so the drain is observable on both sides.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Open the SIGINT/SIGTERM streams up front in a `ShutdownSignals` helper and recv()
twice, instead of re-registering a fresh listener for each wait. This closes the
small window where a second signal could arrive between the first firing and the
new listener being registered, and makes the soft (drain) vs hard (force) mapping
explicit: SIGINT/SIGTERM drains, a second forces, and SIGKILL stays the kernel's
uncatchable backstop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread rs/moq-native/src/server.rs Outdated
Comment thread rs/moq-net/src/session.rs Outdated
Comment thread rs/moq-net/src/session.rs Outdated
Comment thread rs/moq-relay/src/main.rs Outdated
kixelated and others added 2 commits June 9, 2026 21:09
Reworks the session GOAWAY surface per review feedback:

- Replace `Session::goaway(&str)` with a public `Drain` handle: `session.drain()`
  returns a `Drain` whose `start(uri)` sends the GOAWAY and `complete().await`
  waits for the peer to leave. This combines "send GOAWAY" and "await drain" into
  one type instead of two loose methods.
- `Drain::start` takes `impl Into<Option<&str>>`, matching the crate convention for
  optional args (`None` to just drain, `Some(uri)` to redirect).
- Back the trigger with `kio::Producer`/`Consumer` instead of `tokio::sync::watch`,
  consistent with the rest of moq-net's async state.

moq-native: rename `with_signal_handler` to `with_ctrl_c_handler`, and replace the
`select!` `if`-guard on the built-in Ctrl+C arm with an explicit ctrl_c-or-pending
future so the behavior doesn't depend on guard-evaluation timing.

moq-relay/test: use `session.drain().start(None)` / `drain.complete().await`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-d9673b

# Conflicts:
#	rs/moq-net/src/session.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant