Add native AXL test framework (POC) by thesayyn · Pull Request #1238 · aspect-build/aspect-cli

thesayyn · 2026-06-16T04:15:24Z

What this is

A proof-of-concept for giving AXL a first-class, pytest-style testing story built into the engine — the result of a design discussion about a better testing story for AXL. It implements the load-bearing decisions end to end so we can review from a working artifact, then iterate.

Full design, roadmap, and a log of decisions I made without explicit sign-off are in docs/testing.md.

The shape

# lib/ci_test.axl  — a *_test.axl file gets the augmented test surface
load("./ci.axl", "detect_ci_host")

def test_github_actions_precedence(t):
    t.env.set("GITHUB_ACTIONS", "true")
    t.env.set("BUILDKITE", "true")
    expect.eq(detect_ci_host(t.ctx.std.env)["marker"], "GITHUB_ACTIONS")

No per-test wiring in config.axl, no pipeline.yaml list, no copied _eq, no _snapshot_env/_restore_env.

What's implemented (all in `crates/axl-runtime`)

Test-only globals. *_test.axl files evaluate against base AXL + a test-only expect namespace, selected by filename suffix in the loader (eval/load.rs, eval/api.rs::get_test_globals). The vocabulary exists only in test files — proven by a test that expect is absent from production globals.
Convention discovery. Tests are def test_*(t) functions; the runner enumerates test_* callables (mirrors FrozenTaskModuleLike::tasks()).
Bazel-free harness t. t.env (in-memory env overlay), t.std, and t.ctx — a real TaskContext, the same Rust type production uses.
Mock by backend-swap, not type masquerade. t.ctx.std.env is the genuine std.Env type; it reads the in-memory overlay only because the runner installs a test_env on eval.extra (engine/store.rs, engine/std/env.rs). Contract stays identical to reality, enforced by the type system; internal downcast_ref::<RealType>() keeps working.
Per-test isolation, pytest semantics. A failed assertion raises, is caught per-test, and the run continues.

Verify

cargo test -p axl-runtime testing::

Three passing tests: discovery + isolation + failure capture; the test-only globals split; and "overlay never leaks into the real process env".

Notable decisions made without sign-off (see docs for the full list)

expect, not assert — assert is a reserved keyword in the dialect and won't parse as an identifier. Alternatives: check, or harness methods (t.assert_eq).
Assertions are a global namespace (vs. methods on t).
Only the env backend is mocked so far; io/fs/net/process/bazel follow the identical pattern (roadmap in docs) but aren't in this slice.
No aspect test CLI task yet — the runner is a Rust function proven by tests; wiring it as a builtin AXL task (next to axl_add.axl) via a sandbox-run primitive is the next step.

Not caused by this PR

Two pre-existing axl-runtime test failures (bug_1060_… timing test; reports_still_poisoned_when_removal_fails, which expects a dir removal to fail but succeeds when running as root) fail on a clean checkout too — verified by stashing these changes.

https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Generated by Claude Code

CLAassistant · 2026-06-16T04:15:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

aspect-workflows · 2026-06-16T04:19:31Z

✨ Aspect Workflows Tasks

📅 Wed Jun 17 19:50:32 UTC 2026

⚠️ 2 flagged tasks

⚠️ delivery (delivery-gha-debug) · ⏱ 24s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
💬 Delivery complete (1 delivered · 2 warn · 3 skipped)
⚠️ delivery (delivery-gha) · ⏱ 38.6s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
💬 Delivery complete (1 delivered · 2 warn · 3 skipped)

✅ 26 successful tasks

✅ build (build-gha-debug) · ⏱ 3m 30s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (170 built)
✅ build (build-gha) · ⏱ 4m 4s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (170 built)
✅ buildifier (buildifier-gha-debug) · ⏱ 1m 7s · 🐙 GitHub Actions · ☑️ Check
💬 Format complete (clean)
✅ buildifier (buildifier-gha) · ⏱ 1m 57s · 🐙 GitHub Actions · ☑️ Check
💬 Format complete (clean)
✅ format (format-gha-debug) · ⏱ 2m 3s · 🐙 GitHub Actions · ☑️ Check
💬 Format complete (clean)
✅ format (format-gha) · ⏱ 1m 53s · 🐙 GitHub Actions · ☑️ Check
💬 Format complete (clean)
✅ gazelle (gazelle-gha-debug) · ⏱ 1m 3s · 🐙 GitHub Actions · ☑️ Check
💬 Gazelle complete (clean)
✅ gazelle (gazelle-from-source-gha-debug) · ⏱ 2m 19s · 🐙 GitHub Actions · ☑️ Check
💬 Gazelle complete (clean)
✅ gazelle (gazelle-from-source-gha) · ⏱ 2m 10s · 🐙 GitHub Actions · ☑️ Check
💬 Gazelle complete (clean)
✅ gazelle (gazelle-gha) · ⏱ 51s · 🐙 GitHub Actions · ☑️ Check
💬 Gazelle complete (clean)
✅ build (init-cpp) · ⏱ 2m 4s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (12 built)
✅ build (init-go) · ⏱ 3m · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (25 built)
✅ build (init-java) · ⏱ 38.6s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (13 built)
✅ build (init-js) · ⏱ 43.7s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (25 built)
✅ build (init-kitchen-sink) · ⏱ 7m 36s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (84 built)
✅ build (init-kotlin) · ⏱ 57.6s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (12 built)
✅ build (init-minimal) · ⏱ 40.6s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (4 built)
✅ build (init-py) · ⏱ 57s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (12 built)
✅ build (init-ruby) · ⏱ 4m 32s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (10 built)
✅ build (init-rust) · ⏱ 44.2s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (10 built)
✅ build (init-scala) · ⏱ 6m 8s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (9 built)
✅ build (init-shell) · ⏱ 1m 33s · 🐙 GitHub Actions · ☑️ Check
💬 Bazel build complete (10 built)
✅ lint (lint-gha-debug) · ⏱ 1m 23s · 🐙 GitHub Actions · ☑️ Check
💬 Lint complete (clean)
✅ lint (lint-gha) · ⏱ 1m 17s · 🐙 GitHub Actions · ☑️ Check
💬 Lint complete (clean)
✅ test (test-gha-debug) · ⏱ 4m 2s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
💬 Bazel test complete (27/27 passed · 25 cached)
✅ test (test-gha) · ⏱ 4m 20s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
💬 Bazel test complete (27/27 passed · 25 cached)

🔁 Reproduce

⚠️ delivery (delivery-gha-debug · delivery-gha)

# --mode=always --track-state=false for off-runner with no state backend.
aspect delivery \
  --commit-sha=e238c8d9e0e59a71751f04e41671db7fdbe27155 \
  --mode=always \
  --track-state=false \
  --dry-run=true

_{Install aspect: docs.aspect.build/cli/install}

_{⏱ Last updated Wed Jun 17 20:00:20 UTC 2026 · 📊 GitHub API quota 2,464/15,000 (16% used, resets in 15m)}
_{🚀 Powered by Aspect CLI (v0.0.0-dev) | Aspect Build · X · LinkedIn · YouTube}

Proof-of-concept for a built-in, pytest-style testing story for AXL, implementing the load-bearing pieces end to end in the engine: - `*_test.axl` files are evaluated against an augmented globals surface (base AXL + a test-only `expect` namespace), selected by filename suffix in the loader. The test vocabulary exists only in test files and cannot leak into production config/builtins. - Tests are `def test_*(t)` functions, discovered by convention. - The harness `t` is bazel-free: `t.env` (in-memory env overlay), `t.std`, and `t.ctx` (a real TaskContext, the same Rust type production uses). - Mocking is backend-swap, not type masquerade: `t.ctx.std.env` is the genuine `std.Env` type; it reads the in-memory overlay only because the runner installs a `test_env` on `eval.extra`. Contract stays identical to reality, enforced by the type system. - Per-test isolation with pytest semantics: a failed assertion raises, is caught per-test, and the run continues. Runner is exposed as a Rust function with three passing tests; design, roadmap, and the decisions made without sign-off are in docs/testing.md. Note: `assert` is a reserved keyword in the dialect, so the namespace is named `expect` (flagged for review). https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

assert is a reserved keyword in the dialect; the plural asserts parses and reads almost exactly like assert.*.

Run discovered `test_*` functions across min(tests, cpus) worker threads, each with its own Starlark heap (heaps are !Send), re-evaluating the side-effect-free module body locally and merging outcomes back into definition order for a deterministic report. Per-test state lives on the test's own values (env overlay), never a process-global, so concurrent workers share no mutable state. - `run_test_source` keeps its signature; defaults jobs to min(tests, cpus). - `run_test_source_with_jobs` exposes the explicit `--jobs`-style knob. - New `runs_tests_in_parallel_shards` test forces 8 workers over 17 tests and asserts cross-test isolation holds concurrently + ordering is stable. docs/testing.md records the walkthrough decisions: asserts global namespace; env as a value-carried overlay vs bazel as BazelBackend::{Real, Fake}; the bazel Fake design (generic fake-bazel process, socketpair control channel, synthesized BES/execlog/stream surfaces, fork+exec not fork(), reuse basil-core, no embedded-binary bloat); and the parallelism rules that turn process-global shortcuts (BAZEL_REAL, fixed BES paths, the spawn registry) into bugs to fix before the bazel Fake lands. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Move the in-memory env overlay off the production `RuntimeEnv` (`store.rs`) and onto the harness-constructed Starlark values, so the mock route is value-carried rather than ambient through `eval.extra`/`from_eval`. - `store.rs`: drop `Env::test_env` + `with_test_env`. `TestEnvMap` is now `Arc<Mutex<BTreeMap<…>>>` (was `Rc<RefCell<…>>`) so the values that carry it satisfy the `Send + Sync` bound frozen Starlark values require. `Env::from_eval` stays — production still reads cli_version / roots through it. - `std/env.rs`: `std.Env` gains `Option<TestEnvMap>`. `var`/`set_var`/ `remove_var`/`vars` read/write the overlay carried on `this` when present, else the real process env. No more `from_eval` overlay reads. - `std/mod.rs`: `Std` gains `Option<TestEnvMap>` and mints its `std.env` carrying that handle. `Std::new()` (None) for production. - `task_context.rs`: `TaskContext` gains `env_overlay`; `ctx.std` mints `Std` carrying it. `with_env_overlay` builder for the test runner. Frozen contexts (production-only) always hand out `Std::new()`. - `testing.rs`: `Test`/`TestEnv` carry the overlay `Rc`. The runner mints `t.env`, `t.std`, and `t.ctx` from one shared handle, so all three observe the same map. `eval.extra` now carries only the production `base_env` (for cli_version/roots), not the overlay. Per-test state lives on the value, never a process-global; each overlay is touched only on its own worker thread, so the mutex is never contended. `config_context`/`feature_context` updated to `Std::new()`. docs/testing.md: decision 6 + roadmap 1c marked done. cargo test -p axl-runtime testing:: — 4 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Implements increment 2 of the AXL native testing POC: `ctx.bazel` can be backed by a `Fake` bazel that synthesizes a real BES stream from a declared, typed fixture — so tests exercise the production `ctx.bazel.build` read path against a controllable bazel. - `bazel/backend.rs` (new): `BazelBackend::{Real, Fake{fake_bin, expectation}}` carried ON the `Bazel` Starlark value (not `eval.extra`), read via `read_backend`, so it is per-value and parallel-safe. `Fake` builds the `Command` straight from the fake path — no `BAZEL_REAL` global. A per-invocation `socketpair` `ControlChannel` (behind a trait for a later Windows transport) inherits the read end into the child (CLOEXEC cleared in `pre_exec`) and ships the length-delimited `BazelExpectation` frame parent→child. - `basil-core` (new crate): the reusable replay/synthesis guts extracted from `basil`. `BazelExpectation` (prost message) + `BuildResult` enum; `replay_expectation` synthesizes `BuildStarted` → `TargetComplete`* → `BuildFinished` + exit code onto the real `--build_event_binary_file`. Raw `events=` escape hatch passes pre-framed `BuildEvent`s through. The legacy named scenarios move here verbatim so existing build.rs tests keep passing. - `basil`: now a thin argv/env front-end over `basil-core`. Generic fixture mode (control fd present) reads the expectation off `ASPECT_FAKE_BAZEL_FD`; named-scenario mode unchanged. - `bazel/mod.rs` + `build.rs`: thread the backend through `Build::spawn`. `Fake` skips the live `server_info()` probe and uses the child pid as galvanize's `server_pid`. `multi_phase.rs`: production mints `Real`. - `testing.rs`: `t.bazel.expect_build(*targets, result=, exit_code=)` declares the per-test fixture (its own `Arc<Mutex>` cell, never a global); `t.ctx` mints the `Fake` backend from it. New runner `run_test_source_with_fake_bazel`. End-to-end test proves a declared expectation flows over the socketpair and is read back through the real BES path (events + exit code). - nix gains the "socket" feature; basil-core wired into the Cargo and bazel build graphs. docs/testing.md: decision 7 + roadmap item 2 done. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

CI's bazel build rejected the previous commit: enabling the `nix` "socket" feature changed the rules_rs crate-extension facts, making the committed `MODULE.bazel.lock` stale (`--lockfile_mode=error`). Repinning the lockfile isn't possible here, and the feature is avoidable: a Unix `socketpair(AF_UNIX, SOCK_STREAM)` is exactly `std::os::unix::net:: UnixStream::pair()`, which needs no extra nix feature. - `backend.rs`: build the control channel from `UnixStream::pair()` instead of `nix::sys::socket::socketpair`; hold the child end as a `UnixStream`. std sets `FD_CLOEXEC` on both ends, so the existing `pre_exec` fcntl that clears it on the inherited fd is now load-bearing (doc updated). `nix::libc` (re-exported unconditionally) still supplies the fcntl constants. - `axl-runtime/Cargo.toml`: revert nix features to `["fs", "signal"]`. No external crate is added or removed (Cargo.lock unchanged but for the expected `basil-core` entry), so the committed MODULE.bazel.lock stays valid. Verified locally: cargo check (offline) clean and `cargo test -p axl-runtime testing::` — 5 passed, including the e2e fake-bazel test (socketpair → fork+exec → synthesized BES read back). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

The bazel analysis of //:cli failed: `//crates/basil-core` depends on `//crates/axl-proto`, but axl-proto's curated visibility list didn't include basil-core. Add `//crates/basil-core:__pkg__` alongside the existing basil / axl-runtime / build-event-stream entries. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Consolidate onto one fake-bazel mechanism. The `engine::bazel::build` tests drove basil via the `BAZEL_REAL` process-global + `--scenario=<name>` argv; now they mint `ctx.bazel` with a `BazelBackend::Fake` carrying a typed `BazelExpectation` (same path the AXL test harness uses), so the named-scenario table and the global env var are no longer needed. - `MultiPhaseEval::with_bazel_backend`: optional backend override for the contexts it mints (production stays `Real`; tests pass `Fake`). - `crate::test`: `.with_fake_bazel()` / `.with_fake_bazel_expectation(...)` carry an expectation; `run_task` builds the `Fake` backend and threads it through `MultiPhaseEval`. Removed `install_basil()` and the `BAZEL_REAL` global (resolves the decision-8 parallelism hazard for these tests). build.rs snippets drop their `--scenario=` flags; the cache-evicted (bug-1060) test declares `BuildResult::CacheEvicted`. - `basil-core`: deleted the dead named-scenario surface (`scenario`, `write_scenario`, `Scenario`, `ExitBehavior`); the generic `BazelExpectation` synthesis path is the only one left. - `basil`: now a thin generic-only front-end (reads the expectation off the control fd, replays, exits). Dropped the `info`/`--scenario` verbs, `BASIL_SERVER_PID`, and its now-unused axl-proto/prost deps. Verified offline: cargo check clean; `engine::bazel::build::tests` (19) and `engine::testing` (5) pass. (bug_1060 is the pre-existing parallel-cold-build flake — deterministic green in isolation and once basil is built; CI pre-builds basil as a data dep.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

The `info::server_info()` / `client_pid` / `is_server_busy` / `server_pid_nonblocking` free functions all hardcoded `bazel_command()` (the real bazel) and couldn't see the backend, so every info-shaped call silently bypassed `BazelBackend::Fake`. Consolidate the fork onto the backend: a single `base_command(startup_flags)` primitive plus typed verb methods (`info`, `server_info`, `client_pid`, `is_server_busy`, `server_pid_nonblocking`). `info` is now the general key→value method and `server_info` is `info(["server_pid","release"])` + parse, dropping a duplicate parse loop. `info.rs` shrinks to pure parsing helpers (`parse_release`, `parse_info_map`). The version probe in `resolve_flags_for_running_bazel`, the public `ctx.bazel.info()`, `query`, `health_check`, and `cancel_invocation` now route through the backend (threaded through `Query` and `Cancellation`), so Real/Fake is decided in one place. The one genuine topology fact — `server_pid = child.id()` for the fake, which has no separate daemon — stays inside `Build::spawn`. Behavior-preserving: the fake path is unchanged (still build/test-only) until basil learns the new verbs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Replace the two `is_fake()` checks in `Build::spawn` with backend methods: `build_server_info` (Real probes the daemon; Fake returns `(0, None)` with no probe) and `bes_server_pid` (Real → daemon pid; Fake → the child it just spawned, since the fake has no separate daemon). The galvanize liveness topology now lives on the backend instead of leaking into the spawn path, and `is_fake()` is gone. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Drop internal jargon ("decision 6", eval.extra) from the Bazel.backend doc comment left over from conflict resolution; the field is carried on the value and untraced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

Expose the built-in parallel `*_test.axl` runner to Starlark and wire up an `aspect axl test` command that drives it. - Grant the first-party `@aspect` module the existing `#_is_std#` std-context privilege (alongside `@std`/`@bazel`) so its files may reach `__builtins__`. Third-party modules stay unprivileged. - Add `__builtins__.testing().run(source)`, gated at the accessor (matching `hash()`/`time()`), returning a summary dict `{error, passed, failed, outcomes:[{name, passed, message}]}`. A module-level parse/load failure is surfaced as `error` rather than raising, so one bad file never aborts a run. - Add the `aspect axl test` task: discovers `*_test.axl` files under the given paths (defaulting to the workspace root; skips hidden dirs and does not follow directory symlinks), runs each through the runner, reports per-test results, and exits non-zero on any failure or file error. The runner remains loader-free for now: a test file that `load(...)`s other modules is reported as a file error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

claude added 12 commits June 17, 2026 18:30

Rename test assertion namespace expect -> asserts

45add0c

assert is a reserved keyword in the dialect; the plural asserts parses and reads almost exactly like assert.*.

Apply rustfmt to cancel.rs and health_check.rs

8011238

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B

thesayyn force-pushed the claude/youthful-noether-lthnua branch from 5fc2c6b to 514ccb0 Compare June 17, 2026 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native AXL test framework (POC)#1238

Add native AXL test framework (POC)#1238
thesayyn wants to merge 13 commits into
mainfrom
claude/youthful-noether-lthnua

thesayyn commented Jun 16, 2026

Uh oh!

CLAassistant commented Jun 16, 2026

Uh oh!

aspect-workflows Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thesayyn commented Jun 16, 2026

What this is

The shape

What's implemented (all in crates/axl-runtime)

Verify

Notable decisions made without sign-off (see docs for the full list)

Not caused by this PR

Uh oh!

CLAassistant commented Jun 16, 2026

Uh oh!

aspect-workflows Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Aspect Workflows Tasks

⚠️ 2 flagged tasks

✅ 26 successful tasks

🔁 Reproduce

⚠️ delivery (delivery-gha-debug · delivery-gha)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

What's implemented (all in `crates/axl-runtime`)

aspect-workflows Bot commented Jun 16, 2026 •

edited

Loading