Skip to content

feat(execution): persist engine runs + tool calls via ExecutionStore#398

Draft
aryasaatvik wants to merge 2 commits intoRhysSullivan:mainfrom
aryasaatvik:feat/execution-engine-persistence
Draft

feat(execution): persist engine runs + tool calls via ExecutionStore#398
aryasaatvik wants to merge 2 commits intoRhysSullivan:mainfrom
aryasaatvik:feat/execution-engine-persistence

Conversation

@aryasaatvik
Copy link
Copy Markdown
Contributor

@aryasaatvik aryasaatvik commented Apr 24, 2026

Stack

Depends on #396 — merge that first. Cross-fork GitHub PRs can't use a branch on a contributor fork as a base, so these five PRs display as independent in the UI. The real dependency chain is the commit graph.

# PR Purpose
1 #396 feat(sdk): ExecutionStore backed by DBAdapter
2 #398 ← you are here feat(execution): persist engine runs + tool calls
3 #399 feat(execution): trigger propagation (CLI/HTTP/MCP)
4 #400 feat(apps): execution tables in drizzle schemas
5 #401 feat(api): /executions list/get/tool-calls endpoints

Until #396 lands, this diff includes its commits. After #396 merges, this diff shrinks to just the engine changes below.


Summary

Wires the ExecutionStore added in #396 into the Effect-native engine. Every execute() / executeWithPause() / resume() call now writes execution + tool-call + interaction rows through whichever DBAdapter backs the SDK — sqlite, postgres, memory, anything else that implements the contract gets history for free.

What ships in this PR (delta beyond #396)

Engine API:

  • ExecutionTrigger type + new trigger? option on execute / executeWithPause. Callers attribute runs (cli, http, mcp, …); kind + optional meta blob persist on the row.
  • Execution id is a crypto.randomUUID() minted at engine entry and reused as PausedExecution.id, so caller-visible ids and DB row ids are the same value.

Recording:

  • makeRecordingInvoker wraps the SandboxToolInvoker passed to the code executor: each invoke writes a tool-call row (running → completed | failed) with durationMs. Storage failures are ignored so bookkeeping can never fail the tool call itself.
  • persistTerminalState runs once on fiber success/failure and writes final status, resultJson, errorText, logsJson, toolCallCount, completedAt.
  • Elicitation lifecycle (both inline + pausable paths): execution transitions to waiting_for_interaction, pending execution_interaction row created; on resume the row is resolved (or cancelled if action === "cancel") before the fiber is unblocked.
  • toolCallCounters keeps the same Ref across pause/resume so the final count is accurate even for multi-pause runs.

Test plan

  • bun x vitest run in @executor/execution — 15/15 tests pass (10 existing + 5 new in engine-persistence.test.ts).
  • bun x tsc --noEmit — zero type errors.
  • bun x vitest run in @executor/sdk — 97/97 tests still pass.

New test coverage (engine-persistence.test.ts)

  1. Completed run records status + toolCallCount + tool call rows.
  2. Errored result → status=failed, errorText captured.
  3. Interaction lifecycle wired cleanly.
  4. Trigger kind + meta persist on the row.
  5. Failed tool call records status=failed with errorText.

Adds execution history persistence to the core SDK surface, wiring
three new tables (`execution`, `execution_interaction`,
`execution_tool_call`) into `coreSchema` and exposing an
`ExecutionStore` service on `executor.executions`.

Changes:
- `core-schema.ts`: three new tables with `scope_id` / `execution_id`
  / `tool_path` / `trigger_kind` / `created_at` indexes for the runs
  UI's faceting + timeline queries.
- `ids.ts`: branded `ExecutionId`, `ExecutionInteractionId`,
  `ExecutionToolCallId`.
- `executions.ts`: `Execution`, `ExecutionInteraction`,
  `ExecutionToolCall` Schema classes, status enums,
  create/update/filter/sort/meta input types, and the
  `ExecutionStore` Context.Tag.
- `execution-store.ts`: `makeExecutionStore(core)` — an
  adapter-backed `ExecutionStoreService` implementation. Wraps
  `typedAdapter<CoreSchema>` for CRUD, handles cursor-based
  pagination, filter predicates (status, trigger, tool-path glob,
  time range, code substring, hadElicitation), and builds list meta
  with facets + chart buckets.
- `cursor.ts`: base64url `{ createdAt, id }` pagination cursors.
- `executor.ts`: constructs the store once per executor, exposes via
  `executor.executions`.
- `executions.test.ts`: round-trip + lifecycle coverage against the
  in-memory adapter (no migrations needed).

Follow-up work (future PRs in the stack):
- wire the engine to record runs + tool calls through this store,
- add `/executions` API endpoints, and
- land the runs UI.
aryasaatvik added a commit to aryasaatvik/executor that referenced this pull request Apr 24, 2026
Extends the existing `/executions` group with the three read
endpoints the runs UI needs. Handlers delegate to
`executor.executions.*` (added in RhysSullivan#396 / RhysSullivan#398) and scope each read
to the innermost executor scope — same rule the engine applies
when writing.

**Endpoints:**
- `GET /executions` — list with filter + cursor + optional meta.
  Query params: `limit`, `cursor`, `status` (CSV), `trigger` (CSV),
  `tool` (CSV of paths/globs), `from`/`to` (epoch ms), `after`,
  `code` (substring), `sort` (`<field>,<dir>`), `elicitation`
  (`"true"` / `"false"`). Meta bundles facets + timeline buckets;
  handler only asks for it when the request isn't paginated
  (no `cursor` / `after`), so cheap "first page, full facets" is
  the default call shape.
- `GET /executions/:id` — single execution detail +
  `pendingInteraction`. 404 on unknown id via
  `ExecutionNotFoundError` (already declared on the group).
- `GET /executions/:id/tool-calls` — tool-call timeline. 404 on
  unknown execution (guard rail so empty arrays don't mask typos).

**Response shape:** every `Date` is serialized to epoch ms at the
handler edge (`.getTime()`) so the wire format stays numeric. The
schemas in `api.ts` mirror the SDK's row projections one-to-one
modulo that transform.

**CSV + enum handling:** `splitCsv`, `parseSortParam`,
`parseElicitationParam` live in the handler file because they're
edge concerns — the SDK takes typed arrays and enums. Invalid sort
fields / directions drop back to defaults (no 400).

No new tests — the handlers are thin wrappers over the SDK store,
which already has round-trip + filter + meta coverage in
`packages/core/sdk/src/executions.test.ts`. The CSV/enum parsers
are small enough to validate by inspection.
Wires `executor.executions` into the Effect-native engine so every
`execute()` / `executeWithPause()` / `resume()` call writes an
`execution` row and its associated tool-call + interaction rows to
whichever `DBAdapter` backs the SDK.

Engine additions:
- `ExecutionTrigger` type + new `trigger?` option on `execute` and
  `executeWithPause`. Callers attribute runs ("cli", "http", "mcp",
  …); the kind + optional meta blob are persisted on the row.
- A stable `crypto.randomUUID()` execution id is minted at entry and
  reused as `PausedExecution.id`, so callers and the DB share the
  same identifier and counts line up across pause/resume.
- `makeRecordingInvoker` wraps the `SandboxToolInvoker` passed to the
  code executor; each `invoke` writes a tool-call row (running →
  completed|failed with duration). Storage errors are ignored so
  bookkeeping failures can never fail the tool call itself.
- `persistTerminalState` runs once on fiber success or failure and
  writes final status, result/error, logs, toolCallCount, completedAt.
- Pausable path: on elicitation, the execution transitions to
  `waiting_for_interaction` and a pending interaction row is created;
  `resume` resolves it (or cancels it if action === "cancel") before
  unblocking the fiber. A `toolCallCounters` map keeps the same Ref
  across pause/resume so the final count is accurate.
- Inline path: wraps the caller-supplied `onElicitation` so every
  inline elicitation gets the same pending → resolved bookkeeping.

Tests (`engine-persistence.test.ts`, 5 cases) cover:
- completed run + tool call rows
- error result → status=failed, errorText captured
- toolCallCount rolls up correctly
- trigger kind + meta persist on the row
- failed tool call records status=failed with errorText
@aryasaatvik aryasaatvik force-pushed the feat/execution-engine-persistence branch from 70e493d to 3bc7760 Compare April 24, 2026 20:05
aryasaatvik added a commit to aryasaatvik/executor that referenced this pull request Apr 24, 2026
Extends the existing `/executions` group with the three read
endpoints the runs UI needs. Handlers delegate to
`executor.executions.*` (added in RhysSullivan#396 / RhysSullivan#398) and scope each read
to the innermost executor scope — same rule the engine applies
when writing.

**Endpoints:**
- `GET /executions` — list with filter + cursor + optional meta.
  Query params: `limit`, `cursor`, `status` (CSV), `trigger` (CSV),
  `tool` (CSV of paths/globs), `from`/`to` (epoch ms), `after`,
  `code` (substring), `sort` (`<field>,<dir>`), `elicitation`
  (`"true"` / `"false"`). Meta bundles facets + timeline buckets;
  handler only asks for it when the request isn't paginated
  (no `cursor` / `after`), so cheap "first page, full facets" is
  the default call shape.
- `GET /executions/:id` — single execution detail +
  `pendingInteraction`. 404 on unknown id via
  `ExecutionNotFoundError` (already declared on the group).
- `GET /executions/:id/tool-calls` — tool-call timeline. 404 on
  unknown execution (guard rail so empty arrays don't mask typos).

**Response shape:** every `Date` is serialized to epoch ms at the
handler edge (`.getTime()`) so the wire format stays numeric. The
schemas in `api.ts` mirror the SDK's row projections one-to-one
modulo that transform.

**CSV + enum handling:** `splitCsv`, `parseSortParam`,
`parseElicitationParam` live in the handler file because they're
edge concerns — the SDK takes typed arrays and enums. Invalid sort
fields / directions drop back to defaults (no 400).

No new tests — the handlers are thin wrappers over the SDK store,
which already has round-trip + filter + meta coverage in
`packages/core/sdk/src/executions.test.ts`. The CSV/enum parsers
are small enough to validate by inspection.
aryasaatvik added a commit to aryasaatvik/executor that referenced this pull request Apr 24, 2026
Extends the existing `/executions` group with the three read
endpoints the runs UI needs. Handlers delegate to
`executor.executions.*` (added in RhysSullivan#396 / RhysSullivan#398) and scope each read
to the innermost executor scope — same rule the engine applies
when writing.

**Endpoints:**
- `GET /executions` — list with filter + cursor + optional meta.
  Query params: `limit`, `cursor`, `status` (CSV), `trigger` (CSV),
  `tool` (CSV of paths/globs), `from`/`to` (epoch ms), `after`,
  `code` (substring), `sort` (`<field>,<dir>`), `elicitation`
  (`"true"` / `"false"`). Meta bundles facets + timeline buckets;
  handler only asks for it when the request isn't paginated
  (no `cursor` / `after`), so cheap "first page, full facets" is
  the default call shape.
- `GET /executions/:id` — single execution detail +
  `pendingInteraction`. 404 on unknown id via
  `ExecutionNotFoundError` (already declared on the group).
- `GET /executions/:id/tool-calls` — tool-call timeline. 404 on
  unknown execution (guard rail so empty arrays don't mask typos).

**Response shape:** every `Date` is serialized to epoch ms at the
handler edge (`.getTime()`) so the wire format stays numeric. The
schemas in `api.ts` mirror the SDK's row projections one-to-one
modulo that transform.

**CSV + enum handling:** `splitCsv`, `parseSortParam`,
`parseElicitationParam` live in the handler file because they're
edge concerns — the SDK takes typed arrays and enums. Invalid sort
fields / directions drop back to defaults (no 400).

No new tests — the handlers are thin wrappers over the SDK store,
which already has round-trip + filter + meta coverage in
`packages/core/sdk/src/executions.test.ts`. The CSV/enum parsers
are small enough to validate by inspection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant