Skip to content

chore: extract main into an embeddable internal/app package#5259

Open
siavashs wants to merge 2 commits into
prometheus:mainfrom
siavashs:feat/app/pkg
Open

chore: extract main into an embeddable internal/app package#5259
siavashs wants to merge 2 commits into
prometheus:mainfrom
siavashs:feat/app/pkg

Conversation

@siavashs
Copy link
Copy Markdown
Contributor

@siavashs siavashs commented May 29, 2026

Summary

Resolves the long-standing TODO from #406 by extracting the Alertmanager process logic out of cmd/alertmanager/main.go into a new internal/app package, and giving it a lifecycle API (New / Start / Addr / Reload / Stop) so tests and other binaries can embed Alertmanager in-process instead of building and shelling out to the compiled binary.

cmd/alertmanager/main.go shrinks from 724 → 196 lines and now owns only: kingpin flag parsing, logger construction, versioncollector registration, feature-flag / GOMEMLIMIT side effects, and translating OS signals into context cancellation (SIGINT/SIGTERM) plus reload events (SIGHUP) consumed by app.Run.

Commits

1. cmd/alertmanager: extract main into internal/app package

Mechanical extraction. The new package is split into focused files:

The six previously package-level promauto.NewXxx variables in cmd/alertmanager/main.go are now constructed per Run() invocation against opts.Registerer. Combined with threading the registerer through every collaborator (versioncollector excepted, which stays process-global in main.go), this unblocks running multiple Alertmanager instances in the same process without duplicate-registration panics.

2. internal/app: add App lifecycle (New/Start/Addr/Reload/Stop)

Adds a richer lifecycle API on top of the extraction:

New(opts) (*App, error)
(*App).Start() error
(*App).Addr() string             // first listener
(*App).Addrs() []string          // all listeners
(*App).Reload(ctx) error
(*App).Stop(ctx) error

Run is preserved as a thin wrapper (New + Start + serveLoop + Stop) with a deferred Stop on a fresh 30s context so cleanup also runs on panic, matching the implicit panic-safety of the previous defer-based implementation.

Internally, setup uses a cleanup stack (a.onStop) that Stop drains in LIFO order, mirroring Go's defer semantics so the source order of the old defer X lines in Run is preserved verbatim and the shutdown ordering does not depend on hand-written reverse-order code. Listeners are bound at New time via a new listenAll helper that calls net.Listen directly (so Addr is meaningful before Start); web.ServeMultiple is then invoked in Start. Systemd socket activation is not supported when embedding and returns an explicit error pointing callers back to cmd/alertmanager.

Stop honors its context parameter for the HTTP shutdown step, capped at 5s, so callers passing a tighter deadline get faster teardown and callers passing context.Background get the default.

Behavioural notes

  • prometheus.DefaultRegisterer is no longer referenced inside app.Run; the binary still passes it in via Options.Registerer so on-disk behaviour is identical.
  • srv.Shutdown now actually runs on Run exit (previously the deferred srv.Close lived inside the listen goroutine and never ran in practice because os.Exit killed the process first). Behaviour for the binary is unchanged; embedded callers now get clean HTTP teardown.
  • tracingManager.Stop is part of the cleanup stack and therefore always runs, not just on ctx.Done() (previously leaked on listen failure, but the leak was masked by os.Exit).
  • --cluster.listen-address default moved from a const in cmd/alertmanager to the exported app.DefaultClusterAddr.

Known follow-ups (out of scope)

  • matcher/compat.InitFromFlags still mutates package-level state; multi-instance tests with different feature flags will collide. Tracked separately.
  • Migrating the v2 acceptance harness to drive app.Run directly instead of building and spawning the binary. Now mechanically possible thanks to Addr() / Stop() on *App; left for a follow-up PR to keep this one reviewable.

Verification

New tests in internal/app/lifecycle_test.go:

  • TestApp_StartStop — boot, probe /-/healthy, stop, stop again (idempotency).
  • TestApp_TwoSequentialInstances — same process, two consecutive New → Start → Stop cycles. Guards the metrics-per-Registerer fix against duplicate-registration panics.
  • TestApp_TwoConcurrentInstances — two live instances on different ephemeral ports simultaneously.
  • TestApp_Run_ContextCancel — end-to-end Run wrapper with ctx cancellation.

Diff size

 cmd/alertmanager/main.go         | -528  (724 → 196)
 internal/app/app.go              | +508
 internal/app/lifecycle.go        | +228
 internal/app/lifecycle_test.go   | +168
 internal/app/options.go          | +104
 internal/app/metrics.go          |  +98
 internal/app/url.go              |  +57
 internal/app/cluster.go          |  +29
 internal/app/url_test.go         | rename (was cmd/alertmanager/main_test.go)

Closes #406

Pull Request Checklist

Please check all the applicable boxes.

  • Please list all open issue(s) discussed with maintainers related to this change
    • Fixes #
  • Is this a new Receiver integration?
  • Is this a bugfix?
    • I have added tests that can reproduce the bug which pass with this bugfix applied
  • Is this a new feature?
    • I have added tests that test the new feature's functionality
  • Does this change affect performance?
    • I have provided benchmarks comparison that shows performance is improved or is not degraded
      • You can use benchstat to compare benchmarks
    • I have added new benchmarks if required or requested by maintainers
  • Is this a breaking change?
    • My changes do not break the existing cluster messages
    • My changes do not break the existing api
  • I have added/updated the required documentation
  • I have signed-off my commits
  • I will follow best practices for contributing to this project

Which user-facing changes does this PR introduce?

NONE

Summary by CodeRabbit

  • Refactor

    • Reorganized runtime initialization with improved lifecycle management for cleaner code structure and maintainability.
  • Tests

    • Added comprehensive lifecycle tests covering initialization, startup, shutdown, configuration reloading, and concurrent instance handling.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e6b16ca-a1cf-4a04-8716-f77ac3f9bdbe

📥 Commits

Reviewing files that changed from the base of the PR and between b273f3f and cdd9230.

📒 Files selected for processing (3)
  • internal/app/app.go
  • internal/app/lifecycle.go
  • internal/app/lifecycle_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/app/app.go
  • internal/app/lifecycle.go
  • internal/app/lifecycle_test.go

📝 Walkthrough

Walkthrough

This PR extracts Alertmanager's runtime initialization from the main.go file into a new internal/app package. The refactoring introduces a composable App type with lifecycle methods (New, Start, Stop, Reload) and an entry-point function Run() that wires all subsystems including dispatcher, API, notification logging, silences, cluster peer, and HTTP serving. The simplified main.go delegates initialization to this package rather than maintaining a monolithic function. Comprehensive tests validate the lifecycle API and exercise deadlock/rollback scenarios.

Changes

App Package Refactoring and Extraction

Layer / File(s) Summary
App type and lifecycle control
internal/app/lifecycle.go
App struct encapsulates initialization state, HTTP server, listeners, cleanup callbacks, and coordination channels. New(opts) constructs and wires subsystems with rollback-safe error handling. Start() launches HTTP and reload routing with one-time guard. Stop(ctx) gracefully shuts down HTTP (5s timeout), stops reload router if started, and runs cleanup in reverse LIFO order. Addr()/Addrs() expose bound listener addresses. Reload(ctx) delegates to config coordinator.
App initialization and subsystem wiring
internal/app/app.go
Run(ctx, opts) entry point constructs, starts, and ensures panic-safe shutdown. setup() validates options, initializes metrics/data directory, configures gossip TLS and optional cluster peer, loads initial config, creates event recorder with startup/shutdown events, initializes notification log and silences with maintenance goroutines and optional cluster broadcast, creates alerts provider, constructs API with external URL and cluster-aware timeout, initializes tracing, defines config reload subscription that rebuilds routes/receivers/time-intervals/pipeline/inhibitor/dispatcher in correct order, applies initial config, derives route prefix, builds instrumented HTTP router with UI and reload endpoints, and binds listeners.
App lifecycle and serving helpers
internal/app/lifecycle.go
Internal reloadRouter() forwards both programmatic and HTTP /-/reload signals to config coordinator. Internal serveLoop(ctx) waits for context cancellation or serve-error channel. Internal onStop(fn) registers LIFO cleanup callbacks. Public listenAll(flags) binds TCP listeners, rejects systemd socket activation, and closes partial bindings on failure.
Support utilities for configuration and instrumentation
internal/app/cluster.go, internal/app/metrics.go, internal/app/url.go, internal/app/url_test.go
clusterWait() returns closure computing peer-position-scaled wait duration. newMetrics() registers HTTP request/response histograms and cluster/receiver/integration/inhibition-rule gauges. instrumentHandler() wraps handlers with promhttp middleware. extURL() constructs parsed external URLs with scheme validation and path normalization. URL test package updated from main to app.
Lifecycle integration tests
internal/app/lifecycle_test.go
minimalConfig YAML bootstrap and testOptions helper initialize ephemeral configuration with feature flags, disabled clustering, and deterministic timeouts. TestApp_StartStop verifies startup, health polling, and idempotent shutdown. Sequential/concurrent instance creation tests validate isolation. TestApp_EmbeddedReloadDoesNotDeadlock ensures /-/reload POST completes in embedded mode. TestApp_New_SetupFailureDoesNotDeadlock validates setup failure rollback. TestApp_Run_ContextCancel exercises context cancellation.
Main entry point refactoring
cmd/alertmanager/main.go
Simplified main.go removes ~515 lines of in-file subsystem wiring and delegates to app.Run(ctx, opts). Retains flag parsing, logger/feature-flag initialization, signal handling (SIGINT/SIGTERM context, SIGHUP reload channel), and app.Options construction. Updates --cluster.listen-address default to app.DefaultClusterAddr, reduces imports, and moves Prometheus versioncollector registration after logger creation.

Sequence Diagram

sequenceDiagram
  participant Run as Run(ctx, opts)
  participant New as New(opts)
  participant Setup as setup()
  participant Subsystems as Core Subsystems
  participant Coordinator as ConfigCoordinator
  
  Run->>New: construct App
  New->>Setup: wire subsystems
  Setup->>Subsystems: initialize metrics, nflog, silences, alerts
  Setup->>Subsystems: create dispatcher via pipeline
  Setup->>Coordinator: create with reload callback
  Setup->>Coordinator: apply initial config
  Coordinator->>Subsystems: rebuild routes/inhibitor/dispatcher on reload
  New-->>Run: return App
  Run->>Run: start app + block serveLoop
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • prometheus/alertmanager#5102: The main PR's refactor moves HTTP/UI/router and reload wiring into internal/app (via webReload/server setup), which directly overlaps with PR #5102's router endpoint registration changes (ui.Register signature change and new weboperations.Register for /-/reload, /metrics, and debug handlers).

Suggested reviewers

  • ultrotter
  • sysadmind
  • Spaceman1701
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main refactoring: extracting main logic into an embeddable internal/app package.
Description check ✅ Passed The description is comprehensive and addresses most template requirements, including linked issues, summary, commits, behavioral notes, and tests.
Linked Issues check ✅ Passed The PR fully addresses issue #406 by extracting main into a reusable package with a lifecycle API (New/Start/Addr/Reload/Stop) and per-instance metrics.
Out of Scope Changes check ✅ Passed All changes are within scope and directly support the extraction objective: options, app orchestration, lifecycle management, metrics, cluster/url helpers, and comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@siavashs siavashs added kind/cleanup go Pull requests that update Go code labels May 29, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/alertmanager/main.go`:
- Line 205: The shutdown log is hardcoded to "Received SIGTERM…" which
misreports SIGINT/Ctrl+C; change the logger.Info call in main.go (the location
that currently logs "Received SIGTERM, exiting gracefully...") to log the actual
signal received (use the signal variable from the signal.Notify/select or, if
you cancel via ctx, log ctx.Err() or a generic "shutting down" message) so the
message reflects the real cause; update the handler that calls app.Run and the
signal.Notify/select branch to pass the received os.Signal (or its String())
into logger.Info instead of the fixed "SIGTERM" text.

In `@internal/app/lifecycle.go`:
- Around line 142-163: The Stop method can block forever on the "for range
a.srvc" if Start's serve goroutine never closes a.srvc; change Stop to perform a
non-blocking drain of a.srvc instead of a blocking range so Stop returns safely
even if Start wasn't run. Specifically, update App.Stop to replace the for range
over a.srvc with a loop that repeatedly attempts a non-blocking receive from
a.srvc (e.g., select with a receive case and a default case) until the channel
is drained/closed or there is nothing to read; reference symbols: App.Stop,
a.srvc, Start, New.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cfbc0c18-07d3-4da0-85e9-8fea36348cfa

📥 Commits

Reviewing files that changed from the base of the PR and between cc7bf21 and a59bd1f.

📒 Files selected for processing (9)
  • cmd/alertmanager/main.go
  • internal/app/app.go
  • internal/app/cluster.go
  • internal/app/lifecycle.go
  • internal/app/lifecycle_test.go
  • internal/app/metrics.go
  • internal/app/options.go
  • internal/app/url.go
  • internal/app/url_test.go

Comment thread cmd/alertmanager/main.go Outdated
Comment thread internal/app/lifecycle.go
Move the body of run() from cmd/alertmanager/main.go into a new
internal/app package so Alertmanager can be embedded in tests and
other binaries without shelling out to a compiled binary. Resolves
the long-standing TODO from prometheus#406.

cmd/alertmanager/main.go shrinks from 724 to 196 lines and is now
responsible only for kingpin flag parsing, logger construction,
versioncollector registration, feature-flag / GOMEMLIMIT side
effects, and translating OS signals into context cancellation
(SIGINT/SIGTERM) plus reload events (SIGHUP) consumed by app.Run.

The new internal/app package is split into:

  * options.go  - Options struct, validate(), DefaultClusterAddr
  * app.go      - Run(ctx, opts) error
  * metrics.go  - per-instance Prometheus metrics struct
  * cluster.go  - clusterWait helper
  * url.go      - extURL helper (+ url_test.go for TestExternalURL)

The six previously package-level promauto.NewXxx variables in
cmd/alertmanager/main.go are now constructed per Run() invocation
against opts.Registerer. Combined with threading the registerer
through every collaborator (versioncollector excepted, which stays
in main.go as a process-global), this unblocks running multiple
Alertmanager instances in the same process without duplicate-
registration panics.

Behavioural notes:

  * prometheus.DefaultRegisterer is no longer referenced inside
    app.Run; the binary still passes it in via Options.Registerer
    so on-disk behaviour is identical.
  * app.Run defers srv.Shutdown(5s) on exit. Previously the
    deferred srv.Close lived inside the listen goroutine and never
    ran in practice because os.Exit killed the process first.
    Behaviour for the binary is unchanged; embedded callers now
    get clean HTTP teardown.
  * --cluster.listen-address default moved from a const in
    cmd/alertmanager to the exported app.DefaultClusterAddr.

Known follow-ups intentionally out of scope:

  * matcher/compat.InitFromFlags still mutates package-level
    state; multi-instance tests with different feature flags will
    collide.
  * Richer App lifecycle (New/Start/Addr/Reload/Stop) for tests
    that need :0-port discovery or programmatic reload.
  * Migrating the v2 acceptance harness to use app.Run directly
    instead of building and spawning the binary.

Verification: `go build ./...`, `go vet ./...`, and
`go test -count=1 ./...` all pass, including the existing
test/with_api_v2/acceptance suite which continues to build and
spawn the binary end-to-end.

Closes prometheus#406

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/app/lifecycle.go`:
- Around line 96-113: The Start method can deadlock because the registered
/-/reload handler blocks on sending to the unbuffered a.webReload channel (and
errors from web.ServeMultiple are sent to a.srvc) while the only consumer
(serveLoop, invoked by Run) may not be running for embedders; fix by ensuring
Start spawns the reload-and-error drain loop so a.webReload and a.opts.Reload
are drained even when Run/serveLoop is not used, or alternatively make the
reload handler do a non-blocking send/fail-fast: add a goroutine in Start that
runs the same logic as serveLoop (draining a.webReload, a.opts.Reload and
forwarding errors to a.reload/ reload handler) and ensure web.ServeMultiple
errors sent to a.srvc are observed (do not close a.srvc before draining), or
change the handler to select { case a.webReload <- errc: default: respond with
an immediate error } so the handler never blocks when the drain loop is absent;
update Start, the /-/reload handler, and any use of a.srvc/a.webReload
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6a09cb35-1507-4077-b2b1-721ea7afc541

📥 Commits

Reviewing files that changed from the base of the PR and between a59bd1f and 8741088.

📒 Files selected for processing (9)
  • cmd/alertmanager/main.go
  • internal/app/app.go
  • internal/app/cluster.go
  • internal/app/lifecycle.go
  • internal/app/lifecycle_test.go
  • internal/app/metrics.go
  • internal/app/options.go
  • internal/app/url.go
  • internal/app/url_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • internal/app/url_test.go
  • internal/app/options.go
  • internal/app/cluster.go
  • internal/app/metrics.go
  • internal/app/url.go
  • internal/app/lifecycle_test.go
  • internal/app/app.go

Comment thread internal/app/lifecycle.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/app/lifecycle.go (1)

53-55: ⚡ Quick win

Stale doc comments still point at serveLoop.

webReload is now consumed by reloadRouter, not serveLoop. The same staleness applies to the Reload docstring on Line 175 ("Safe to call concurrently with serveLoop"), since serveLoop no longer routes reloads. In code this deadlock-sensitive, accurate "who consumes this channel" comments matter for future maintainers.

📝 Suggested doc fixes
 	// webReload is the channel exposed by httpserver.Register for the
-	// /-/reload HTTP endpoint. We read from it in serveLoop.
+	// /-/reload HTTP endpoint. We read from it in reloadRouter.
 	webReload chan chan error
 // Reload triggers a configuration reload (the programmatic equivalent of
-// SIGHUP). Safe to call concurrently with serveLoop.
+// SIGHUP). Safe to call concurrently with the running App.
 func (a *App) Reload(_ context.Context) error {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/lifecycle.go` around lines 53 - 55, Doc comments are stale:
update the comment for the webReload channel and the Reload docstring (mentions
of "serveLoop") to reflect that reloads are now consumed by reloadRouter, not
serveLoop; locate the declaration webReload and the Reload method/docstring and
change references from serveLoop to reloadRouter and adjust wording about
concurrency to say "Safe to call concurrently with reloadRouter" (or similar) so
the consumer is accurate and deadlock-sensitive guidance is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/app/lifecycle_test.go`:
- Around line 161-168: The tests TestApp_EmbeddedReloadDoesNotDeadlock and
TestApp_New_SetupFailureDoesNotDeadlock use require.NoError/Equal/Error inside
spawned goroutines (the anonymous go func that closes done), which can call
t.FailNow from a child goroutine; change these to not call require from the
goroutine: either (A) replace require.* with assert.* inside the goroutine
(e.g., assert.NoError/assert.Equal/assert.Error) or (B) capture the goroutine
results by sending error/status values down a channel (use the existing done
channel or a new result channel) and perform require.* assertions on those
results in the main test goroutine after <-done; update the anonymous functions
and their callers (the POST to "/-/reload" and the setup-failure goroutine) to
use one of these patterns so all require.* calls run on the main test goroutine.

---

Nitpick comments:
In `@internal/app/lifecycle.go`:
- Around line 53-55: Doc comments are stale: update the comment for the
webReload channel and the Reload docstring (mentions of "serveLoop") to reflect
that reloads are now consumed by reloadRouter, not serveLoop; locate the
declaration webReload and the Reload method/docstring and change references from
serveLoop to reloadRouter and adjust wording about concurrency to say "Safe to
call concurrently with reloadRouter" (or similar) so the consumer is accurate
and deadlock-sensitive guidance is preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7df0d969-feab-4bf1-8cf6-684bb59bd394

📥 Commits

Reviewing files that changed from the base of the PR and between 8741088 and b273f3f.

📒 Files selected for processing (3)
  • internal/app/app.go
  • internal/app/lifecycle.go
  • internal/app/lifecycle_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/app/app.go

Comment thread internal/app/lifecycle_test.go
introduce an App lifecycle so tests and embedders can drive
Alertmanager without OS signals or os/exec, and discover the bound
HTTP address even when listening on ":0".

API:

  New(opts) (*App, error)
  (*App).Start() error
  (*App).Addr() string             // first listener
  (*App).Addrs() []string          // all listeners
  (*App).Reload(ctx) error
  (*App).Stop(ctx) error

Run is preserved as a thin wrapper (New + Start + serveLoop + Stop)
with a deferred Stop on a fresh 30s context so cleanup also runs on
panic, matching the implicit panic-safety of the previous defer-
based implementation.

Internally, setup uses a cleanup stack (a.onStop) that Stop drains
in LIFO order, mirroring Go's defer semantics so the source order
of the old `defer X` lines in Run is preserved verbatim and the
shutdown ordering does not depend on hand-written reverse-order
code. Listeners are bound at New time via a new listenAll helper
that calls net.Listen directly (so Addr is meaningful before
Start); web.ServeMultiple is then invoked in Start. Systemd socket
activation is not supported when embedding and returns an explicit
error pointing callers back to cmd/alertmanager.

Stop honors its context parameter for the HTTP shutdown step,
capped at 5s, so callers passing a tighter deadline get faster
teardown and callers passing context.Background get the default.

Tests cover: single instance round-trip; two sequential instances
in the same process (guards the Phase A metrics-per-Registerer
fix against duplicate-registration panics); two concurrent
instances on distinct ephemeral ports; and the Run wrapper
end-to-end with ctx cancellation. All pass under -race.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
@siavashs siavashs marked this pull request as ready for review May 29, 2026 11:30
@siavashs siavashs requested a review from a team as a code owner May 29, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code kind/cleanup

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract main function into package

1 participant