feat(execution): persist engine runs + tool calls via ExecutionStore by aryasaatvik · Pull Request #398 · RhysSullivan/executor

aryasaatvik · 2026-04-24T15:15:06Z

Stack

Depends on #396 — merge that first. Cross-fork GitHub PRs can't use a branch on a contributor fork as a base, so these five PRs display as independent in the UI. The real dependency chain is the commit graph.

#	PR	Purpose
1	#396	`feat(sdk)`: ExecutionStore backed by DBAdapter
2	#398 ← you are here	`feat(execution)`: persist engine runs + tool calls
3	#399	`feat(execution)`: trigger propagation (CLI/HTTP/MCP)
4	#400	`feat(apps)`: execution tables in drizzle schemas
5	#401	`feat(api)`: /executions list/get/tool-calls endpoints

Until #396 lands, this diff includes its commits. After #396 merges, this diff shrinks to just the engine changes below.

Summary

Wires the ExecutionStore added in #396 into the Effect-native engine. Every execute() / executeWithPause() / resume() call now writes execution + tool-call + interaction rows through whichever DBAdapter backs the SDK — sqlite, postgres, memory, anything else that implements the contract gets history for free.

What ships in this PR (delta beyond #396)

Engine API:

ExecutionTrigger type + new trigger? option on execute / executeWithPause. Callers attribute runs (cli, http, mcp, …); kind + optional meta blob persist on the row.
Execution id is a crypto.randomUUID() minted at engine entry and reused as PausedExecution.id, so caller-visible ids and DB row ids are the same value.

Recording:

makeRecordingInvoker wraps the SandboxToolInvoker passed to the code executor: each invoke writes a tool-call row (running → completed | failed) with durationMs. Storage failures are ignored so bookkeeping can never fail the tool call itself.
persistTerminalState runs once on fiber success/failure and writes final status, resultJson, errorText, logsJson, toolCallCount, completedAt.
Elicitation lifecycle (both inline + pausable paths): execution transitions to waiting_for_interaction, pending execution_interaction row created; on resume the row is resolved (or cancelled if action === "cancel") before the fiber is unblocked.
toolCallCounters keeps the same Ref across pause/resume so the final count is accurate even for multi-pause runs.

Test plan

bun x vitest run in @executor/execution — 15/15 tests pass (10 existing + 5 new in engine-persistence.test.ts).
bun x tsc --noEmit — zero type errors.
bun x vitest run in @executor/sdk — 97/97 tests still pass.

New test coverage (`engine-persistence.test.ts`)

Completed run records status + toolCallCount + tool call rows.
Errored result → status=failed, errorText captured.
Interaction lifecycle wired cleanly.
Trigger kind + meta persist on the row.
Failed tool call records status=failed with errorText.

Adds execution history persistence to the core SDK surface, wiring three new tables (`execution`, `execution_interaction`, `execution_tool_call`) into `coreSchema` and exposing an `ExecutionStore` service on `executor.executions`. Changes: - `core-schema.ts`: three new tables with `scope_id` / `execution_id` / `tool_path` / `trigger_kind` / `created_at` indexes for the runs UI's faceting + timeline queries. - `ids.ts`: branded `ExecutionId`, `ExecutionInteractionId`, `ExecutionToolCallId`. - `executions.ts`: `Execution`, `ExecutionInteraction`, `ExecutionToolCall` Schema classes, status enums, create/update/filter/sort/meta input types, and the `ExecutionStore` Context.Tag. - `execution-store.ts`: `makeExecutionStore(core)` — an adapter-backed `ExecutionStoreService` implementation. Wraps `typedAdapter<CoreSchema>` for CRUD, handles cursor-based pagination, filter predicates (status, trigger, tool-path glob, time range, code substring, hadElicitation), and builds list meta with facets + chart buckets. - `cursor.ts`: base64url `{ createdAt, id }` pagination cursors. - `executor.ts`: constructs the store once per executor, exposes via `executor.executions`. - `executions.test.ts`: round-trip + lifecycle coverage against the in-memory adapter (no migrations needed). Follow-up work (future PRs in the stack): - wire the engine to record runs + tool calls through this store, - add `/executions` API endpoints, and - land the runs UI.

Extends the existing `/executions` group with the three read endpoints the runs UI needs. Handlers delegate to `executor.executions.*` (added in RhysSullivan#396 / RhysSullivan#398) and scope each read to the innermost executor scope — same rule the engine applies when writing. **Endpoints:** - `GET /executions` — list with filter + cursor + optional meta. Query params: `limit`, `cursor`, `status` (CSV), `trigger` (CSV), `tool` (CSV of paths/globs), `from`/`to` (epoch ms), `after`, `code` (substring), `sort` (`<field>,<dir>`), `elicitation` (`"true"` / `"false"`). Meta bundles facets + timeline buckets; handler only asks for it when the request isn't paginated (no `cursor` / `after`), so cheap "first page, full facets" is the default call shape. - `GET /executions/:id` — single execution detail + `pendingInteraction`. 404 on unknown id via `ExecutionNotFoundError` (already declared on the group). - `GET /executions/:id/tool-calls` — tool-call timeline. 404 on unknown execution (guard rail so empty arrays don't mask typos). **Response shape:** every `Date` is serialized to epoch ms at the handler edge (`.getTime()`) so the wire format stays numeric. The schemas in `api.ts` mirror the SDK's row projections one-to-one modulo that transform. **CSV + enum handling:** `splitCsv`, `parseSortParam`, `parseElicitationParam` live in the handler file because they're edge concerns — the SDK takes typed arrays and enums. Invalid sort fields / directions drop back to defaults (no 400). No new tests — the handlers are thin wrappers over the SDK store, which already has round-trip + filter + meta coverage in `packages/core/sdk/src/executions.test.ts`. The CSV/enum parsers are small enough to validate by inspection.

Wires `executor.executions` into the Effect-native engine so every `execute()` / `executeWithPause()` / `resume()` call writes an `execution` row and its associated tool-call + interaction rows to whichever `DBAdapter` backs the SDK. Engine additions: - `ExecutionTrigger` type + new `trigger?` option on `execute` and `executeWithPause`. Callers attribute runs ("cli", "http", "mcp", …); the kind + optional meta blob are persisted on the row. - A stable `crypto.randomUUID()` execution id is minted at entry and reused as `PausedExecution.id`, so callers and the DB share the same identifier and counts line up across pause/resume. - `makeRecordingInvoker` wraps the `SandboxToolInvoker` passed to the code executor; each `invoke` writes a tool-call row (running → completed|failed with duration). Storage errors are ignored so bookkeeping failures can never fail the tool call itself. - `persistTerminalState` runs once on fiber success or failure and writes final status, result/error, logs, toolCallCount, completedAt. - Pausable path: on elicitation, the execution transitions to `waiting_for_interaction` and a pending interaction row is created; `resume` resolves it (or cancels it if action === "cancel") before unblocking the fiber. A `toolCallCounters` map keeps the same Ref across pause/resume so the final count is accurate. - Inline path: wraps the caller-supplied `onElicitation` so every inline elicitation gets the same pending → resolved bookkeeping. Tests (`engine-persistence.test.ts`, 5 cases) cover: - completed run + tool call rows - error result → status=failed, errorText captured - toolCallCount rolls up correctly - trigger kind + meta persist on the row - failed tool call records status=failed with errorText

Extends the existing `/executions` group with the three read endpoints the runs UI needs. Handlers delegate to `executor.executions.*` (added in RhysSullivan#396 / RhysSullivan#398) and scope each read to the innermost executor scope — same rule the engine applies when writing. **Endpoints:** - `GET /executions` — list with filter + cursor + optional meta. Query params: `limit`, `cursor`, `status` (CSV), `trigger` (CSV), `tool` (CSV of paths/globs), `from`/`to` (epoch ms), `after`, `code` (substring), `sort` (`<field>,<dir>`), `elicitation` (`"true"` / `"false"`). Meta bundles facets + timeline buckets; handler only asks for it when the request isn't paginated (no `cursor` / `after`), so cheap "first page, full facets" is the default call shape. - `GET /executions/:id` — single execution detail + `pendingInteraction`. 404 on unknown id via `ExecutionNotFoundError` (already declared on the group). - `GET /executions/:id/tool-calls` — tool-call timeline. 404 on unknown execution (guard rail so empty arrays don't mask typos). **Response shape:** every `Date` is serialized to epoch ms at the handler edge (`.getTime()`) so the wire format stays numeric. The schemas in `api.ts` mirror the SDK's row projections one-to-one modulo that transform. **CSV + enum handling:** `splitCsv`, `parseSortParam`, `parseElicitationParam` live in the handler file because they're edge concerns — the SDK takes typed arrays and enums. Invalid sort fields / directions drop back to defaults (no 400). No new tests — the handlers are thin wrappers over the SDK store, which already has round-trip + filter + meta coverage in `packages/core/sdk/src/executions.test.ts`. The CSV/enum parsers are small enough to validate by inspection.

aryasaatvik force-pushed the feat/execution-engine-persistence branch from 70e493d to 3bc7760 Compare April 24, 2026 20:05

This was referenced Apr 24, 2026

feat(execution): propagate trigger context from CLI, HTTP, and MCP hosts aryasaatvik/executor#6

Draft

feat(execution): persist engine runs + tool calls via ExecutionStore aryasaatvik/executor#7

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(execution): persist engine runs + tool calls via ExecutionStore#398

feat(execution): persist engine runs + tool calls via ExecutionStore#398
aryasaatvik wants to merge 2 commits intoRhysSullivan:mainfrom
aryasaatvik:feat/execution-engine-persistence

aryasaatvik commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aryasaatvik commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stack

Summary

What ships in this PR (delta beyond #396)

Test plan

New test coverage (engine-persistence.test.ts)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aryasaatvik commented Apr 24, 2026 •

edited

Loading

New test coverage (`engine-persistence.test.ts`)