feat: add skip_permissions, continue command, and build retry for resilient opencode automation#2
feat: add skip_permissions, continue command, and build retry for resilient opencode automation#2jc01rho wants to merge 11 commits into
Conversation
Integration runs now launch opencode with OPENCODE_DISABLE_PROJECT_CONFIG=1 so a cloned repo's .opencode/opencode.json(c) can't fail the build with keys the installed OpenCode doesn't recognize (e.g. Unrecognized key: references).
When opencode run fails with retryable provider errors (Provider returned error, rate limits, 5xx, network timeouts), ORW now automatically retries using `opencode run --continue` to resume the interrupted session. New config fields: - retry_attempts: max attempts including first (default 3) - retry_delay_ms: delay between retries (default 10000ms) Retryable patterns detected from log output: - Provider returned error, rate limit, overloaded - HTTP 429/500/502/503 - Network errors (ECONNRESET, ETIMEDOUT, socket hang up)
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds configurable retry behavior for opencode run failures that appear to be transient provider errors, improving resilience of the check workflow.
Changes:
- Introduces
retry_attempts/retry_delay_msto config types, defaults, init config, and example config. - Wraps OpenCode execution in
runOpenCodeWithRetry()and retries withopencode run --continueon retryable errors. - Documents the new retry behavior and config options in README/help text.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/index.ts |
Adds retry config + new retry wrapper around opencode run and log-based retryable error detection. |
config.example.json |
Shows the new retry configuration keys and defaults. |
README.md |
Documents retry behavior and adds the new config keys to the example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let lastErr: unknown; | ||
|
|
||
| for (let attempt = 1; attempt <= cfg.retry_attempts; attempt++) { | ||
| const isRetry = attempt > 1; | ||
| if (isRetry) { | ||
| out(`Retry attempt ${attempt}/${cfg.retry_attempts} after provider error (waiting ${cfg.retry_delay_ms / 1000}s)...`); | ||
| await note(log, `\n--- Retry attempt ${attempt}/${cfg.retry_attempts} ---\n`); | ||
| await sleep(cfg.retry_delay_ms); | ||
| } | ||
|
|
||
| const args = [cfg.opencode_bin, "run"]; | ||
| if (isRetry) args.push("--continue"); | ||
| args.push("--agent", cfg.agent, "--model", cfg.model, prompt); | ||
|
|
||
| try { | ||
| await run(args, { | ||
| cwd: cfg.work_repo, | ||
| log, | ||
| env: envWithFlag, | ||
| }); | ||
| return; | ||
| } catch (err) { | ||
| lastErr = err; | ||
| if (!isRetryableError(err, log)) { | ||
| throw err; | ||
| } | ||
| out(`Attempt ${attempt} failed with retryable provider error.`); | ||
| } | ||
| } | ||
|
|
||
| throw lastErr; |
|
|
||
| if (msg.includes("exited with")) { | ||
| try { | ||
| const logContent = readFileSync(log, "utf8").toLowerCase(); |
There was a problem hiding this comment.
1 issue found across 3 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
ualtinok
left a comment
There was a problem hiding this comment.
Thanks for the PR — the retry behavior is useful, but I don't think this is safe to merge yet.
Please address these before we reconsider:
-
Validate
retry_attempts. Right nowretry_attempts: 0skipsopencode runentirely and ends up throwingundefined. Since attempts include the initial run, values below 1 should either be rejected clearly or clamped to 1. -
Classify retryability from only the current attempt's output/log slice. The implementation scans the full accumulated log file, so after attempt 1 logs a retryable provider error, any later non-retryable failure can still be treated as retryable because the old message remains in the shared log. Capture a log offset before each attempt, or otherwise inspect only output from that attempt.
-
Tighten the HTTP matching. Bare
"429","502", and"503"can match unrelated numbers in logs. Please match provider/HTTP error context instead. -
Add focused regression coverage for the retry behavior: invalid/zero attempts, retry using
--continue, per-attempt retry classification, and a non-retryable second failure not being retried because of stale first-attempt log content.
Typecheck passes, and the feature direction makes sense, but these failure-mode issues need to be fixed before merge.
1. Clamp retry_attempts to minimum 1 to prevent undefined throw 2. Use per-attempt log slicing to avoid stale retryable errors 3. Tighten HTTP status code matching with context patterns 4. Add focused regression tests for retry behavior
There was a problem hiding this comment.
2 issues found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/index.ts">
<violation number="1" location="src/index.ts:447">
P1: Reading the log slice immediately after the process exits can miss provider-error text because `exec` appends log chunks asynchronously via unawaited `void write(chunk)`. Missing error text causes `isRetryableError` to return false, skipping a needed retry.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
2 issues found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/index.ts">
<violation number="1" location="src/index.ts:447">
P1: Reading the log slice immediately after the process exits can miss provider-error text because `exec` appends log chunks asynchronously via unawaited `void write(chunk)`. Missing error text causes `isRetryableError` to return false, skipping a needed retry.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="prompt.txt">
<violation number="1" location="prompt.txt:24">
P2: Step 5's bug search runs after the build but before verification without an explicit rebuild step, creating ambiguity about whether fixes require recompilation. If bugs are fixed after step 4, the binary verified in steps 6-7 would be stale. The phrase 'before final build' implies a rebuild is needed but no such step is present.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| 4. Build the host CLI in `packages/opencode` with `OPENCODE_CHANNEL={{channel}} OPENCODE_VERSION={{version}} bun run build -- --single`. | ||
| 5. Ensure the built CLI binary exists at `{{cli}}`. | ||
| 6. Run `{{cli}} --version` and verify the output is exactly `{{version}}`. | ||
| 5. Search for session ID encoding bugs in merged PRs before final build. Focus on: |
There was a problem hiding this comment.
P2: Step 5's bug search runs after the build but before verification without an explicit rebuild step, creating ambiguity about whether fixes require recompilation. If bugs are fixed after step 4, the binary verified in steps 6-7 would be stale. The phrase 'before final build' implies a rebuild is needed but no such step is present.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At prompt.txt, line 24:
<comment>Step 5's bug search runs after the build but before verification without an explicit rebuild step, creating ambiguity about whether fixes require recompilation. If bugs are fixed after step 4, the binary verified in steps 6-7 would be stale. The phrase 'before final build' implies a rebuild is needed but no such step is present.</comment>
<file context>
@@ -21,8 +21,13 @@ Exact tasks:
4. Build the host CLI in `packages/opencode` with `OPENCODE_CHANNEL={{channel}} OPENCODE_VERSION={{version}} bun run build -- --single`.
-5. Ensure the built CLI binary exists at `{{cli}}`.
-6. Run `{{cli}} --version` and verify the output is exactly `{{version}}`.
+5. Search for session ID encoding bugs in merged PRs before final build. Focus on:
+ - Auto-resume plugins (`opencode-auto-resume`) that may pass URL-encoded placeholder values like `%7Bid%7D` instead of real session IDs
+ - Session validation code that expects strings starting with `ses` but may receive encoded placeholders
</file context>
| async function continueRun(cfg: Cfg) { | ||
| const release = await latest(cfg); | ||
| const prev = await readState(cfg); | ||
| if (!prev.tag) throw new Error("No previous build state found. Run `orw check` first."); | ||
|
|
||
| const free = await hold(cfg, true); | ||
| try { | ||
| const log = prev.log ?? path.join( | ||
| logDir(cfg), | ||
| `${stamp()}-${release.tag_name.replaceAll("/", "-")}-continue.log`, | ||
| ); | ||
| await fs.mkdir(path.dirname(log), { recursive: true }); | ||
| await note(log, `\n--- Resuming: continue run for ${release.tag_name} ---\n`); | ||
|
|
||
| const env = releaseEnv(release); | ||
| const sources = resolveSources(cfg); | ||
| const prompt = await render(cfg, sources, release); | ||
|
|
||
| out(`Resuming last opencode session for ${release.tag_name}...`); | ||
| const next = await runOpenCodeWithBuildRetry(cfg, prompt, env, log, release); | ||
| await writeState(cfg, next); | ||
| out(`Continue completed for ${release.tag_name}`); | ||
| } finally { | ||
| await free(); | ||
| } | ||
| } |
There was a problem hiding this comment.
orw continue never actually resumes the interrupted session
continueRun delegates to runOpenCodeWithBuildRetry → runOpenCodeWithRetry, which only adds --continue when isRetry = attempt > 1. On the first attempt, isRetry is false, so opencode run is invoked without --continue — starting a brand-new session in the existing work directory rather than resuming the interrupted one. The flag only reaches opencode if that fresh run happens to fail with a retryable provider error and then retries.
To fix this, runOpenCodeWithRetry (and by extension runOpenCodeWithBuildRetry) needs an option like { forceContinue?: boolean } that continueRun can pass through to make --continue unconditional on every attempt.
1. Add build_retry_attempts/build_retry_delay_ms config fields 2. Implement isMissingArtifactError() helper to detect artifact completion 3. Add retry loop around verifyBuild that waits for background artifacts 4. Update help text and config.example.json with new options 5. Add focused regression tests for build retry behavior This enables ORW to automatically retry opencode builds when artifacts aren't immediately available, which handles the case where bun test background processes take longer than 8 seconds to complete. Fixes: build artifact retry for background processes
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
… runs - src/retry.ts: add 'aborted'/'abort' text patterns; treat empty or short (<200 char) log slices as retryable for non-zero exits so that opencode crashes that write nothing to the log (e.g. AbortSignal) are retried. - src/index.ts: add forceContinue param to runOpenCodeWithRetry / runOpenCodeWithBuildRetry. continueRun passes true so the first attempt uses --continue; checkRun leaves it false.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
ualtinok
left a comment
There was a problem hiding this comment.
Thanks for the updates — the retry core is in much better shape now. Per-attempt log slicing via byte offset, clamping attempts to a minimum of 1, the context-aware HTTP status matching, and the new retry.test.ts all address my earlier feedback.
The PR has picked up scope, though, and several issues block merge:
-
Committed artifacts that shouldn't be in the tree:
opencodeandopencode-shallow(gitlink/submodule entries).omo/run-continuation/ses_143629ecfffed1IDVHp4juhQLz.json(a stray session file)bun.lock(806 lines)
Please remove these and add ignores so they can't be re-committed.
-
Two runtime dependencies were added to
package.json(oh-my-openagent,oh-my-opencode-slim) that are never imported anywhere insrc. Please drop them — ORW is a release-automation tool and shouldn't pull in unused runtime packages. -
skip_permissionsdefaults totrue, which appends--dangerously-skip-permissionsto everyopencode run. That's a security-relevant default. Please default it tofalseand make enabling it opt-in. -
Smaller correctness issues in
src/index.ts:check()callsawait writeState(cfg, next)twice in a row.isRetryableErrornow treats bareabortand empty/very-short log slices as retryable, which will retry on failures that aren't transient provider errors. Please scope these to clear provider/transport signals.
Would you be able to split this into a focused retry PR (the retry/continue logic plus tests) and leave out the committed artifacts, the unused deps, and the permissions default? The retry work itself is close; it's the surrounding scope that's the problem.
|
Re wrote as #3 |
Adds automatic retry, permission bypass, and session resume to opencode automation.
Features
Config fields
Implementation
Commands
orw check # Build latest release with full retry
orw check --force # Force rebuild even if already processed
orw continue # Resume last interrupted opencode session
Greptile Summary
This PR adds resilient automation primitives to the
orwtool: provider-error retries (with per-attempt log slicing to avoid stale-log misclassification), build-artifact retries,--dangerously-skip-permissionspassthrough, and a neworw continuecommand that resumes an interrupted opencode session via--continue.runOpenCodeWithRetryhandles transient provider/network errors;runOpenCodeWithBuildRetrywraps it and retries when build artifacts are absent after a run.orw continuecommand: Resolves a freshlatestrelease, appends to the previous log, and delegates to the same retry stack withforceContinue = trueso--continueis passed on every attempt.skip_permissionsconfig field: Passes--dangerously-skip-permissionsto opencode; the documented default (true) and the code default (false) are inconsistent, which will block automation for users who omit the field.Confidence Score: 3/5
Two defects in the core retry path need fixing before this lands in production automation setups.
The
skip_permissionsfield defaults tofalsein code while the PR documentstrue; users who omit the field from their config will have their unattended automation blocked by interactive permission prompts. Separately,isBuildRetryableErrormatches build-error patterns against the thrown error message, which is always"opencode exited with N"— so thebuildErrorPatternsbranch never fires and build retries silently don't happen for compilation failures, only for a missing CLI binary.src/index.ts — the
isBuildRetryableErrorfunction and theskip_permissionsdefault inload()Important Files Changed
continuecommand;isBuildRetryableErrorchecks err.message instead of the log, so build-error patterns never fire;skip_permissionsdefault (false) contradicts the documented default (true)isRetryableError,readLogSlice, andfileSize; regex-gated HTTP status matching and per-attempt log slicing are correctly implementeddependenciesobject added; no functional impact.omo/,bun.lock,opencode, andopencode-shallowto the ignore list; straightforward housekeepingFlowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[orw check / orw continue] --> B[runOpenCodeWithBuildRetry\nbuild_retry_attempts] B --> C[runOpenCodeWithRetry\nretry_attempts] C --> D[opencode run\n--dangerously-skip-permissions?\n--continue if isRetry or forceContinue] D -->|exit 0| E[verifyBuild] E -->|CLI exists & version matches| F[writeState + notify success] E -->|ENOENT — CLI missing| G{isBuildRetryableError?} E -->|version mismatch / desktop missing| H[throw — no notification] D -->|exit non-zero| I[readLogSlice from beforeSize] I --> J{isRetryableError?\ncheck log slice} J -->|yes — provider/network pattern| K{attempts left?} K -->|yes| L[sleep retry_delay_ms\nnote log\nretry with --continue] L --> C K -->|no| M[throw lastErr] J -->|no — non-retryable| N[throw immediately] M --> G N --> G G -->|ENOENT & CLI missing = true| O{build attempts left?} G -->|buildErrorPatterns in err.message\nnever matches 'exited with N'| P[throw — no build retry] O -->|yes| Q[sleep build_retry_delay_ms\nretry build] Q --> C O -->|no| R[throw lastErr]%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[orw check / orw continue] --> B[runOpenCodeWithBuildRetry\nbuild_retry_attempts] B --> C[runOpenCodeWithRetry\nretry_attempts] C --> D[opencode run\n--dangerously-skip-permissions?\n--continue if isRetry or forceContinue] D -->|exit 0| E[verifyBuild] E -->|CLI exists & version matches| F[writeState + notify success] E -->|ENOENT — CLI missing| G{isBuildRetryableError?} E -->|version mismatch / desktop missing| H[throw — no notification] D -->|exit non-zero| I[readLogSlice from beforeSize] I --> J{isRetryableError?\ncheck log slice} J -->|yes — provider/network pattern| K{attempts left?} K -->|yes| L[sleep retry_delay_ms\nnote log\nretry with --continue] L --> C K -->|no| M[throw lastErr] J -->|no — non-retryable| N[throw immediately] M --> G N --> G G -->|ENOENT & CLI missing = true| O{build attempts left?} G -->|buildErrorPatterns in err.message\nnever matches 'exited with N'| P[throw — no build retry] O -->|yes| Q[sleep build_retry_delay_ms\nretry build] Q --> C O -->|no| R[throw lastErr]Comments Outside Diff (1)
src/index.ts, line 438-456 (link)buildErrorPatternsnever matches — build retry dead-code pathisBuildRetryableErrorcheckserr.messagefor patterns like"build failed"and"compilation failed", butrun()always throwsnew Error(\${cmd[0]} exited with ${code}`)(line 1168). The error message will always be"opencode exited with 1"— it never contains build error text, which lives only in the log file. As a result thesebuildErrorPatternscan never match, and build retries only fire whenfs.access(cli)` fails with ENOENT (the CLI binary is absent). Runs that exit non-zero due to a genuine compilation error in the integrated source silently skip the build retry entirely.Reviews (9): Last reviewed commit: "fix: clean up git history and fix retry ..." | Re-trigger Greptile