Skip to content

Add native AXL test framework (POC)#1238

Open
thesayyn wants to merge 13 commits into
mainfrom
claude/youthful-noether-lthnua
Open

Add native AXL test framework (POC)#1238
thesayyn wants to merge 13 commits into
mainfrom
claude/youthful-noether-lthnua

Conversation

@thesayyn

Copy link
Copy Markdown
Member

What this is

A proof-of-concept for giving AXL a first-class, pytest-style testing story built into the engine — the result of a design discussion about a better testing story for AXL. It implements the load-bearing decisions end to end so we can review from a working artifact, then iterate.

Full design, roadmap, and a log of decisions I made without explicit sign-off are in docs/testing.md.

The shape

# lib/ci_test.axl  — a *_test.axl file gets the augmented test surface
load("./ci.axl", "detect_ci_host")

def test_github_actions_precedence(t):
    t.env.set("GITHUB_ACTIONS", "true")
    t.env.set("BUILDKITE", "true")
    expect.eq(detect_ci_host(t.ctx.std.env)["marker"], "GITHUB_ACTIONS")

No per-test wiring in config.axl, no pipeline.yaml list, no copied _eq, no _snapshot_env/_restore_env.

What's implemented (all in crates/axl-runtime)

  • Test-only globals. *_test.axl files evaluate against base AXL + a test-only expect namespace, selected by filename suffix in the loader (eval/load.rs, eval/api.rs::get_test_globals). The vocabulary exists only in test files — proven by a test that expect is absent from production globals.
  • Convention discovery. Tests are def test_*(t) functions; the runner enumerates test_* callables (mirrors FrozenTaskModuleLike::tasks()).
  • Bazel-free harness t. t.env (in-memory env overlay), t.std, and t.ctx — a real TaskContext, the same Rust type production uses.
  • Mock by backend-swap, not type masquerade. t.ctx.std.env is the genuine std.Env type; it reads the in-memory overlay only because the runner installs a test_env on eval.extra (engine/store.rs, engine/std/env.rs). Contract stays identical to reality, enforced by the type system; internal downcast_ref::<RealType>() keeps working.
  • Per-test isolation, pytest semantics. A failed assertion raises, is caught per-test, and the run continues.

Verify

cargo test -p axl-runtime testing::

Three passing tests: discovery + isolation + failure capture; the test-only globals split; and "overlay never leaks into the real process env".

Notable decisions made without sign-off (see docs for the full list)

  1. expect, not assertassert is a reserved keyword in the dialect and won't parse as an identifier. Alternatives: check, or harness methods (t.assert_eq).
  2. Assertions are a global namespace (vs. methods on t).
  3. Only the env backend is mocked so far; io/fs/net/process/bazel follow the identical pattern (roadmap in docs) but aren't in this slice.
  4. No aspect test CLI task yet — the runner is a Rust function proven by tests; wiring it as a builtin AXL task (next to axl_add.axl) via a sandbox-run primitive is the next step.

Not caused by this PR

Two pre-existing axl-runtime test failures (bug_1060_… timing test; reports_still_poisoned_when_removal_fails, which expects a dir removal to fail but succeeds when running as root) fail on a clean checkout too — verified by stashing these changes.

https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B


Generated by Claude Code

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@aspect-workflows

aspect-workflows Bot commented Jun 16, 2026

Copy link
Copy Markdown

✨ Aspect Workflows Tasks

📅 Wed Jun 17 19:50:32 UTC 2026

⚠️ 2 flagged tasks

  • ⚠️ delivery (delivery-gha-debug) · ⏱ 24s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Delivery complete (1 delivered · 2 warn · 3 skipped)
  • ⚠️ delivery (delivery-gha) · ⏱ 38.6s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Delivery complete (1 delivered · 2 warn · 3 skipped)

✅ 26 successful tasks

  • ✅ build (build-gha-debug) · ⏱ 3m 30s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (170 built)
  • ✅ build (build-gha) · ⏱ 4m 4s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (170 built)
  • ✅ buildifier (buildifier-gha-debug) · ⏱ 1m 7s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ buildifier (buildifier-gha) · ⏱ 1m 57s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (format-gha-debug) · ⏱ 2m 3s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (format-gha) · ⏱ 1m 53s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ gazelle (gazelle-gha-debug) · ⏱ 1m 3s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-from-source-gha-debug) · ⏱ 2m 19s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-from-source-gha) · ⏱ 2m 10s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-gha) · ⏱ 51s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ build (init-cpp) · ⏱ 2m 4s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (12 built)
  • ✅ build (init-go) · ⏱ 3m · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (25 built)
  • ✅ build (init-java) · ⏱ 38.6s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (13 built)
  • ✅ build (init-js) · ⏱ 43.7s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (25 built)
  • ✅ build (init-kitchen-sink) · ⏱ 7m 36s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (84 built)
  • ✅ build (init-kotlin) · ⏱ 57.6s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (12 built)
  • ✅ build (init-minimal) · ⏱ 40.6s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (4 built)
  • ✅ build (init-py) · ⏱ 57s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (12 built)
  • ✅ build (init-ruby) · ⏱ 4m 32s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (10 built)
  • ✅ build (init-rust) · ⏱ 44.2s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (10 built)
  • ✅ build (init-scala) · ⏱ 6m 8s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (9 built)
  • ✅ build (init-shell) · ⏱ 1m 33s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (10 built)
  • ✅ lint (lint-gha-debug) · ⏱ 1m 23s · 🐙 GitHub Actions · ☑️ Check
    💬 Lint complete (clean)
  • ✅ lint (lint-gha) · ⏱ 1m 17s · 🐙 GitHub Actions · ☑️ Check
    💬 Lint complete (clean)
  • ✅ test (test-gha-debug) · ⏱ 4m 2s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (27/27 passed · 25 cached)
  • ✅ test (test-gha) · ⏱ 4m 20s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (27/27 passed · 25 cached)

🔁 Reproduce

⚠️ delivery (delivery-gha-debug · delivery-gha)

# --mode=always --track-state=false for off-runner with no state backend.
aspect delivery \
  --commit-sha=e238c8d9e0e59a71751f04e41671db7fdbe27155 \
  --mode=always \
  --track-state=false \
  --dry-run=true

Install aspect: docs.aspect.build/cli/install


⏱ Last updated Wed Jun 17 20:00:20 UTC 2026 · 📊 GitHub API quota 2,464/15,000 (16% used, resets in 15m)
🚀 Powered by Aspect CLI (v0.0.0-dev)  |  Aspect Build · X · LinkedIn · YouTube

claude added 12 commits June 17, 2026 18:30
Proof-of-concept for a built-in, pytest-style testing story for AXL,
implementing the load-bearing pieces end to end in the engine:

- `*_test.axl` files are evaluated against an augmented globals surface
  (base AXL + a test-only `expect` namespace), selected by filename suffix
  in the loader. The test vocabulary exists only in test files and cannot
  leak into production config/builtins.
- Tests are `def test_*(t)` functions, discovered by convention.
- The harness `t` is bazel-free: `t.env` (in-memory env overlay), `t.std`,
  and `t.ctx` (a real TaskContext, the same Rust type production uses).
- Mocking is backend-swap, not type masquerade: `t.ctx.std.env` is the
  genuine `std.Env` type; it reads the in-memory overlay only because the
  runner installs a `test_env` on `eval.extra`. Contract stays identical to
  reality, enforced by the type system.
- Per-test isolation with pytest semantics: a failed assertion raises, is
  caught per-test, and the run continues.

Runner is exposed as a Rust function with three passing tests; design,
roadmap, and the decisions made without sign-off are in docs/testing.md.

Note: `assert` is a reserved keyword in the dialect, so the namespace is
named `expect` (flagged for review).

https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
assert is a reserved keyword in the dialect; the plural asserts parses
and reads almost exactly like assert.*.
Run discovered `test_*` functions across min(tests, cpus) worker threads,
each with its own Starlark heap (heaps are !Send), re-evaluating the
side-effect-free module body locally and merging outcomes back into
definition order for a deterministic report. Per-test state lives on the
test's own values (env overlay), never a process-global, so concurrent
workers share no mutable state.

- `run_test_source` keeps its signature; defaults jobs to min(tests, cpus).
- `run_test_source_with_jobs` exposes the explicit `--jobs`-style knob.
- New `runs_tests_in_parallel_shards` test forces 8 workers over 17 tests
  and asserts cross-test isolation holds concurrently + ordering is stable.

docs/testing.md records the walkthrough decisions: asserts global namespace;
env as a value-carried overlay vs bazel as BazelBackend::{Real, Fake};
the bazel Fake design (generic fake-bazel process, socketpair control
channel, synthesized BES/execlog/stream surfaces, fork+exec not fork(),
reuse basil-core, no embedded-binary bloat); and the parallelism rules that
turn process-global shortcuts (BAZEL_REAL, fixed BES paths, the spawn
registry) into bugs to fix before the bazel Fake lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Move the in-memory env overlay off the production `RuntimeEnv`
(`store.rs`) and onto the harness-constructed Starlark values, so the
mock route is value-carried rather than ambient through
`eval.extra`/`from_eval`.

- `store.rs`: drop `Env::test_env` + `with_test_env`. `TestEnvMap` is now
  `Arc<Mutex<BTreeMap<…>>>` (was `Rc<RefCell<…>>`) so the values that
  carry it satisfy the `Send + Sync` bound frozen Starlark values
  require. `Env::from_eval` stays — production still reads cli_version /
  roots through it.
- `std/env.rs`: `std.Env` gains `Option<TestEnvMap>`. `var`/`set_var`/
  `remove_var`/`vars` read/write the overlay carried on `this` when
  present, else the real process env. No more `from_eval` overlay reads.
- `std/mod.rs`: `Std` gains `Option<TestEnvMap>` and mints its `std.env`
  carrying that handle. `Std::new()` (None) for production.
- `task_context.rs`: `TaskContext` gains `env_overlay`; `ctx.std` mints
  `Std` carrying it. `with_env_overlay` builder for the test runner.
  Frozen contexts (production-only) always hand out `Std::new()`.
- `testing.rs`: `Test`/`TestEnv` carry the overlay `Rc`. The runner mints
  `t.env`, `t.std`, and `t.ctx` from one shared handle, so all three
  observe the same map. `eval.extra` now carries only the production
  `base_env` (for cli_version/roots), not the overlay.

Per-test state lives on the value, never a process-global; each overlay
is touched only on its own worker thread, so the mutex is never
contended. `config_context`/`feature_context` updated to `Std::new()`.
docs/testing.md: decision 6 + roadmap 1c marked done.

cargo test -p axl-runtime testing:: — 4 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Implements increment 2 of the AXL native testing POC: `ctx.bazel` can be
backed by a `Fake` bazel that synthesizes a real BES stream from a
declared, typed fixture — so tests exercise the production
`ctx.bazel.build` read path against a controllable bazel.

- `bazel/backend.rs` (new): `BazelBackend::{Real, Fake{fake_bin,
  expectation}}` carried ON the `Bazel` Starlark value (not `eval.extra`),
  read via `read_backend`, so it is per-value and parallel-safe. `Fake`
  builds the `Command` straight from the fake path — no `BAZEL_REAL`
  global. A per-invocation `socketpair` `ControlChannel` (behind a trait
  for a later Windows transport) inherits the read end into the child
  (CLOEXEC cleared in `pre_exec`) and ships the length-delimited
  `BazelExpectation` frame parent→child.
- `basil-core` (new crate): the reusable replay/synthesis guts extracted
  from `basil`. `BazelExpectation` (prost message) + `BuildResult` enum;
  `replay_expectation` synthesizes `BuildStarted` → `TargetComplete`* →
  `BuildFinished` + exit code onto the real `--build_event_binary_file`.
  Raw `events=` escape hatch passes pre-framed `BuildEvent`s through. The
  legacy named scenarios move here verbatim so existing build.rs tests
  keep passing.
- `basil`: now a thin argv/env front-end over `basil-core`. Generic
  fixture mode (control fd present) reads the expectation off
  `ASPECT_FAKE_BAZEL_FD`; named-scenario mode unchanged.
- `bazel/mod.rs` + `build.rs`: thread the backend through `Build::spawn`.
  `Fake` skips the live `server_info()` probe and uses the child pid as
  galvanize's `server_pid`. `multi_phase.rs`: production mints `Real`.
- `testing.rs`: `t.bazel.expect_build(*targets, result=, exit_code=)`
  declares the per-test fixture (its own `Arc<Mutex>` cell, never a
  global); `t.ctx` mints the `Fake` backend from it. New runner
  `run_test_source_with_fake_bazel`. End-to-end test proves a declared
  expectation flows over the socketpair and is read back through the real
  BES path (events + exit code).
- nix gains the "socket" feature; basil-core wired into the Cargo and
  bazel build graphs. docs/testing.md: decision 7 + roadmap item 2 done.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
CI's bazel build rejected the previous commit: enabling the `nix`
"socket" feature changed the rules_rs crate-extension facts, making the
committed `MODULE.bazel.lock` stale (`--lockfile_mode=error`). Repinning
the lockfile isn't possible here, and the feature is avoidable: a Unix
`socketpair(AF_UNIX, SOCK_STREAM)` is exactly `std::os::unix::net::
UnixStream::pair()`, which needs no extra nix feature.

- `backend.rs`: build the control channel from `UnixStream::pair()`
  instead of `nix::sys::socket::socketpair`; hold the child end as a
  `UnixStream`. std sets `FD_CLOEXEC` on both ends, so the existing
  `pre_exec` fcntl that clears it on the inherited fd is now load-bearing
  (doc updated). `nix::libc` (re-exported unconditionally) still supplies
  the fcntl constants.
- `axl-runtime/Cargo.toml`: revert nix features to `["fs", "signal"]`.

No external crate is added or removed (Cargo.lock unchanged but for the
expected `basil-core` entry), so the committed MODULE.bazel.lock stays
valid. Verified locally: cargo check (offline) clean and
`cargo test -p axl-runtime testing::` — 5 passed, including the e2e
fake-bazel test (socketpair → fork+exec → synthesized BES read back).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
The bazel analysis of //:cli failed: `//crates/basil-core` depends on
`//crates/axl-proto`, but axl-proto's curated visibility list didn't
include basil-core. Add `//crates/basil-core:__pkg__` alongside the
existing basil / axl-runtime / build-event-stream entries.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Consolidate onto one fake-bazel mechanism. The `engine::bazel::build`
tests drove basil via the `BAZEL_REAL` process-global + `--scenario=<name>`
argv; now they mint `ctx.bazel` with a `BazelBackend::Fake` carrying a
typed `BazelExpectation` (same path the AXL test harness uses), so the
named-scenario table and the global env var are no longer needed.

- `MultiPhaseEval::with_bazel_backend`: optional backend override for the
  contexts it mints (production stays `Real`; tests pass `Fake`).
- `crate::test`: `.with_fake_bazel()` / `.with_fake_bazel_expectation(...)`
  carry an expectation; `run_task` builds the `Fake` backend and threads
  it through `MultiPhaseEval`. Removed `install_basil()` and the
  `BAZEL_REAL` global (resolves the decision-8 parallelism hazard for
  these tests). build.rs snippets drop their `--scenario=` flags; the
  cache-evicted (bug-1060) test declares `BuildResult::CacheEvicted`.
- `basil-core`: deleted the dead named-scenario surface (`scenario`,
  `write_scenario`, `Scenario`, `ExitBehavior`); the generic
  `BazelExpectation` synthesis path is the only one left.
- `basil`: now a thin generic-only front-end (reads the expectation off
  the control fd, replays, exits). Dropped the `info`/`--scenario` verbs,
  `BASIL_SERVER_PID`, and its now-unused axl-proto/prost deps.

Verified offline: cargo check clean; `engine::bazel::build::tests`
(19) and `engine::testing` (5) pass. (bug_1060 is the pre-existing
parallel-cold-build flake — deterministic green in isolation and once
basil is built; CI pre-builds basil as a data dep.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
The `info::server_info()` / `client_pid` / `is_server_busy` /
`server_pid_nonblocking` free functions all hardcoded `bazel_command()`
(the real bazel) and couldn't see the backend, so every info-shaped call
silently bypassed `BazelBackend::Fake`. Consolidate the fork onto the
backend: a single `base_command(startup_flags)` primitive plus typed verb
methods (`info`, `server_info`, `client_pid`, `is_server_busy`,
`server_pid_nonblocking`). `info` is now the general key→value method and
`server_info` is `info(["server_pid","release"])` + parse, dropping a
duplicate parse loop.

`info.rs` shrinks to pure parsing helpers (`parse_release`,
`parse_info_map`). The version probe in `resolve_flags_for_running_bazel`,
the public `ctx.bazel.info()`, `query`, `health_check`, and
`cancel_invocation` now route through the backend (threaded through
`Query` and `Cancellation`), so Real/Fake is decided in one place. The
one genuine topology fact — `server_pid = child.id()` for the fake, which
has no separate daemon — stays inside `Build::spawn`.

Behavior-preserving: the fake path is unchanged (still build/test-only)
until basil learns the new verbs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Replace the two `is_fake()` checks in `Build::spawn` with backend methods:
`build_server_info` (Real probes the daemon; Fake returns `(0, None)` with
no probe) and `bes_server_pid` (Real → daemon pid; Fake → the child it just
spawned, since the fake has no separate daemon). The galvanize liveness
topology now lives on the backend instead of leaking into the spawn path,
and `is_fake()` is gone.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Drop internal jargon ("decision 6", eval.extra) from the Bazel.backend
doc comment left over from conflict resolution; the field is carried on
the value and untraced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
@thesayyn thesayyn force-pushed the claude/youthful-noether-lthnua branch from 5fc2c6b to 514ccb0 Compare June 17, 2026 18:41
Expose the built-in parallel `*_test.axl` runner to Starlark and wire up an
`aspect axl test` command that drives it.

- Grant the first-party `@aspect` module the existing `#_is_std#` std-context
  privilege (alongside `@std`/`@bazel`) so its files may reach `__builtins__`.
  Third-party modules stay unprivileged.
- Add `__builtins__.testing().run(source)`, gated at the accessor (matching
  `hash()`/`time()`), returning a summary dict
  `{error, passed, failed, outcomes:[{name, passed, message}]}`. A module-level
  parse/load failure is surfaced as `error` rather than raising, so one bad
  file never aborts a run.
- Add the `aspect axl test` task: discovers `*_test.axl` files under the given
  paths (defaulting to the workspace root; skips hidden dirs and does not
  follow directory symlinks), runs each through the runner, reports per-test
  results, and exits non-zero on any failure or file error.

The runner remains loader-free for now: a test file that `load(...)`s other
modules is reported as a file error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants