Retry mechanism for failed submission workflows#4631
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a user-driven retry path for failed submission workflows so a submission can be re-run without losing the original Matrix room/workflow card (and without duplicating an already-created PrCard). This extends the existing bot-triggered PR creation pipeline with a dedicated pr-listing-retry event and shared workflow orchestration in the bot-runner.
Changes:
- Introduces
pr-listing-retrysupport end-to-end (Matrix bot allowlist + host retry command + bot-runner handler/orchestrator). - Persists
roomId+branchNameon the SubmissionWorkflowCard and adds a UI Retry button with in-flight state. - Refactors bot-runner PR workflow into step methods with
failedSteptagging and adds tests for retry + failure tagging.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/matrix/scripts/setup-submission-bot.ts | Registers pr-listing-retry in bot command allowlist. |
| packages/host/app/commands/retry-submission-workflow.ts | New host command to clear failure state and emit retry trigger into the original room. |
| packages/host/app/commands/index.ts | Wires the new retry command into shims and command class list. |
| packages/host/app/commands/create-submission-workflow.ts | Creates Matrix room earlier and persists roomId/branchName; passes branchName to bot trigger. |
| packages/catalog-realm/submission-workflow-card/submission-workflow-card.gts | Adds failedStep, step-specific failure text, Retry button + retry UX state; changes “View listing” navigation behavior. |
| packages/bot-runner/tests/command-runner-test.ts | Adds retry-with-existing-PrCard test and failedStep tagging test; updates happy-path patch expectations. |
| packages/bot-runner/lib/create-listing-pr-handler.ts | Prefers explicit input.branchName over recomputing from listing name. |
| packages/bot-runner/lib/command-runner.ts | Adds retry handler, shared orchestrator with step tagging, fetch-card-json support, and failure recording with failedStep. |
| packages/base/command.gts | Adds RetrySubmissionWorkflowInput type. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Preview deploymentsHost Test Results 1 files ±0 1 suites ±0 2h 1m 37s ⏱️ + 1m 7s Results for commit d7f7456. ± Comparison against earlier commit 1c669a5. Realm Server Test Results 1 files ±0 1 suites ±0 18m 53s ⏱️ + 1m 40s Results for commit d7f7456. ± Comparison against earlier commit 1c669a5. |
51b3229 to
67b7163
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
67b7163 to
479aba8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
479aba8 to
12f6cb3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
12f6cb3 to
c6195cf
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6195cfd3b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Inline orchestrators for the listing-PR workflow. These are workarounds | ||
| // until user-based proxy requests exist (would let users register auth | ||
| // tokens for external API calls instead of routing through the bot). | ||
| if (eventContent.type === PR_LISTING_CREATE) { | ||
| return await this.handlePrListingCreate(runAs, eventContent); | ||
| } | ||
| if (eventContent.type === PR_LISTING_RETRY) { | ||
| return await this.handlePrListingRetry(runAs, eventContent); |
There was a problem hiding this comment.
why is this here and not in the individual command handler?
There was a problem hiding this comment.
We really need to be careful that this module doesn't end up as a dumping ground for specific command logic. this module is meant to be command agnostic. any logic that you have for a specific command should not live here.
There was a problem hiding this comment.
@habdelra Agreed. This is pre-existing tech debt (there's already a // TODO: inline handling for 'pr-listing-create' is a workaround on main), but adding pr-listing-retry does make it worse. I'll handle it in this PR
| '@cardstack/boxel-host/commands/fetch-card-json/default'; | ||
|
|
||
| const PR_LISTING_CREATE = 'pr-listing-create'; | ||
| const PR_LISTING_RETRY = 'pr-listing-retry'; |
There was a problem hiding this comment.
why are we hard coding knowledge about specific commands here?
There was a problem hiding this comment.
we need to refactor this module and get rid of ALL these consts for specific commands
There was a problem hiding this comment.
Refactored in the latest commit, hope the approach matches the vision!
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
packages/host/app/commands/create-submission-workflow.ts:153
SendBotTriggerEventCommandcan throw (e.g. missing Matrix userId, invalid payload, ormatrixService.sendEventfailure). If this happens after the room + workflow card are created, the workflow card won’t be marked failed and the user may be left with an orphaned workflow card/room and no in-card retry path. Consider wrapping the send in try/catch and patching the workflow card withprCreationError(and possiblyfailedStep) when the trigger dispatch fails.
await new SendBotTriggerEventCommand(this.commandContext).execute({
roomId,
realm,
type: 'pr-listing-create',
input: {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
packages/host/app/commands/create-submission-workflow.ts:153
- If SendBotTriggerEventCommand fails after the workflow card is created/opened, the card will remain without a recorded error state and (per the PR description) may not present a Retry path. Consider wrapping the bot-trigger send in a try/catch and patching the workflow card with a failure message (and/or failedStep) when the send fails so the UI can surface recovery.
await new SendBotTriggerEventCommand(this.commandContext).execute({
roomId,
realm,
type: 'pr-listing-create',
input: {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -70,14 +92,12 @@ export class CommandRunner { | |||
| return; | |||
| } | |||
| private async runFreshPrCardFlow(ctx: WorkflowContext): Promise<PrCardData> { | ||
| let { textFiles, binaryFiles, totalCount } = await runStep( | ||
| 'collect-files', | ||
| () => this.collectFiles(ctx), | ||
| ); | ||
| await runStep('lint', () => this.applyLintSkip(ctx, totalCount)); | ||
| let { prCardResult, prCardUrl } = await runStep('create-pr-card', () => | ||
| this.createPrCard(ctx, textFiles, totalCount), | ||
| ); | ||
| return { prCardResult, prCardUrl, binaryFiles }; |
b6cd864 to
1c669a5
Compare
Address review feedback that command-runner.ts is meant to be command-agnostic but had hardcoded knowledge of pr-listing-create and pr-listing-retry (constants, types, step methods, GitHub/lint dependencies). - Add BotCommandHandler interface and makeEnqueueRunCommand factory to command-runner.ts; CommandRunner now takes handlers[] and dispatches to the first match (or falls back to generic enqueue). - Move all listing-PR specifics — command strings, event types, WorkflowContext, StepError, step methods, helpers — into a new pr-listing-workflow-handler.ts implementing BotCommandHandler. - Wire PrListingWorkflowHandler into timeline-handler.ts. - Test setup goes through a small makeRunner() helper.
1c669a5 to
d7f7456
Compare
Summary
Adds a user-driven Retry on the submission workflow card so failed submissions can be re-run without losing the original Matrix room, workflow card, or the already-created PrCard.
The trigger was the failure mode shown below: bot-runner timeout / fatal worker error during PR creation / PR creation error, which left the workflow card in a red error state with no recovery path.
Example

What changed
New event type & flow
pr-listing-retry— separate bot trigger event, dispatched to its own handler incommand-runner.ts. The bot-runner reads the workflow card to recover roomId, listingId, persistedbranchName, and any already-linked PrCard.collect-submission-files+create-pr-card, fetches the existing PrCard viafetch-card-json, and resumes at the GitHub branch / commit / open-PR step. No duplicate PrCard, no expensive re-collect.runWorkflow) with each step in its own method (collectFiles,createPrCard,loadExistingPrCard,pushToGitHub,linkPrCardOnWorkflow).StepErrortags failures with which step blew up — replaces the previous mutablefailedStepvariable.Workflow card
failedStepfield surfaces which step failed (collect-files / lint / create-pr-card / github-pr), so the UI can show step-specific status text instead of a generic "PR creation failed".roomIdandbranchNameare now persisted on the card at creation time so retry can re-emit the trigger event into the same Matrix room and target the same GitHub branch as the original attempt.prCreationError/failedStep/lintStatus === 'failed'and the presence ofcommandContext+roomId. Optimistically clears the error attributes on click for instant UI feedback.Host
RetrySubmissionWorkflowCommand: reads workflow card, clearsprCreationError+failedStep, sendspr-listing-retryevent into the original room.create-submission-workflow.tsreordered: room created before the workflow card so the room id can be persisted on the card (required for retry).Bot setup
setup-submission-bot.tsregisters a newpr-listing-retryrow inbot_commandsso retry events pass the bot's content-type allowlist. Existing environments must re-run this script.GitHub handler
getCreateListingPRContextprefers explicitinput.branchNameover recomputing fromlistingName. Guarantees retry targets the same branch as the original create.Cross-realm safety
links.selfURLs (e.g."../CardListing/abc") against the workflow card URL when building the retry context, so listingId reachescollect-submission-filesas an absolute URL.Bot-runner refactor (review feedback)
Per @habdelra:
command-runner.tsis meant to be command-agnostic but had hardcoded knowledge ofpr-listing-*events.BotCommandHandlerinterface incommand-runner.ts;CommandRunnernow takeshandlers[]at construction and dispatches to the first match.PrListingWorkflowHandlerimplementingBotCommandHandler.lib/pr-listing/so the orchestrator + GitHub leaf are co-located.timeline-handler.tswires the handler in viamakeEnqueueRunCommand(...).command-runner.tsis ~150 lines with zero references to any specific command. Deletinglib/pr-listing/tomorrow leaves it compiling unchanged.Tests
pr-listing-retry with existing PrCard skips collect-files + create-pr-card— verifies the skip-step path and asserts the GitHub branch matches the persistedbranchName(regression for the title-vs-listingName branch-name bug).pr-listing-create failure tags workflow card with failedStep.pr-listing-createhappy-path assertion now expects the success patch to also clearprCreationError/failedStep.All 38 bot-runner tests pass.
Test plan
pnpm run setup-submission-botto registerpr-listing-retry.SUBMISSION_BOT_GITHUB_TOKEN→ submit → see workflow card fail at github-pr step with PrCard linked → restore token → click Retry → bot-runner logspr-listing-retry: reusing existing PrCard …→ PR opens.