Skip to content

ci(dispatch): cap agent overload retry at 2 attempts#40

Merged
mike-diff merged 1 commit into
mainfrom
ci-dispatch-agent-retry
Jun 24, 2026
Merged

ci(dispatch): cap agent overload retry at 2 attempts#40
mike-diff merged 1 commit into
mainfrom
ci-dispatch-agent-retry

Conversation

@mike-diff

Copy link
Copy Markdown
Owner

Closes #none (CI tuning, no issue).

Problem

The agent retry (added to recover transient 429s from the model service) would spend up to 4 attempts with escalating backoff (0+30+60+120s). Each attempt burns real GLM budget, and a 429 that does not clear in two tries likely indicates a sustained outage rather than a blip, so further retries just waste tokens. Run 28072226718 hit a 429 and the prior 4-attempt retry would have held the job for ~3.5 min of backoff before giving up.

Changes

  • .github/workflows/dispatch.yml: the Run dispatch agent retry is capped at 2 attempts with a single 60s backoff. Safety properties unchanged: only an explicit 429/overload/rate-limit marker is retried; exit 0/3/4 and any non-429 error remain final on the first attempt (a deterministic bug never wastes a second attempt).

How to test

Simulated five cases: (1) first-attempt success = 1 attempt; (2) 429 then success = recovers in 2; (3) persistent 429 = caps at 2; (4) non-429 error = 1 attempt, no retry; (5) 429 then non-429 = stops at 2. The budget-safety property (never retry a non-429 error) holds in every case.

Verification

go build ./... && go vet ./... pass (no Go touched). YAML valid, 12 steps. Retry control flow simulated across 5 cases. No em dashes per AGENTS.md.

The agent retry recovered transient 429s from the model service but would
spend up to 4 attempts (0+30+60+120s backoff) doing so. Each attempt burns
real GLM budget, and a 429 that does not clear in two tries likely indicates
a sustained outage rather than a blip, so further retries just waste tokens.

Cap at 2 attempts with a single 60s backoff. The safety properties are
unchanged: only an explicit 429/overload marker is retried; exit 0/3/4 and
any non-429 error remain final on the first attempt.

Verified by simulating five cases: first-attempt success (1 attempt);
429-then-success recovers (2 attempts); persistent 429 caps at 2; non-429
error does not retry (1 attempt); 429-then-non-429 stops at 2. build/vet
green; no em dashes per AGENTS.md.
@mike-diff mike-diff merged commit 30a3e43 into main Jun 24, 2026
2 checks passed
@mike-diff mike-diff deleted the ci-dispatch-agent-retry branch June 24, 2026 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant