Add native AXL test framework (POC)#1238
Open
thesayyn wants to merge 13 commits into
Open
Conversation
|
|
✨ Aspect Workflows Tasks📅 Wed Jun 17 19:50:32 UTC 2026
|
Proof-of-concept for a built-in, pytest-style testing story for AXL, implementing the load-bearing pieces end to end in the engine: - `*_test.axl` files are evaluated against an augmented globals surface (base AXL + a test-only `expect` namespace), selected by filename suffix in the loader. The test vocabulary exists only in test files and cannot leak into production config/builtins. - Tests are `def test_*(t)` functions, discovered by convention. - The harness `t` is bazel-free: `t.env` (in-memory env overlay), `t.std`, and `t.ctx` (a real TaskContext, the same Rust type production uses). - Mocking is backend-swap, not type masquerade: `t.ctx.std.env` is the genuine `std.Env` type; it reads the in-memory overlay only because the runner installs a `test_env` on `eval.extra`. Contract stays identical to reality, enforced by the type system. - Per-test isolation with pytest semantics: a failed assertion raises, is caught per-test, and the run continues. Runner is exposed as a Rust function with three passing tests; design, roadmap, and the decisions made without sign-off are in docs/testing.md. Note: `assert` is a reserved keyword in the dialect, so the namespace is named `expect` (flagged for review). https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
assert is a reserved keyword in the dialect; the plural asserts parses and reads almost exactly like assert.*.
Run discovered `test_*` functions across min(tests, cpus) worker threads,
each with its own Starlark heap (heaps are !Send), re-evaluating the
side-effect-free module body locally and merging outcomes back into
definition order for a deterministic report. Per-test state lives on the
test's own values (env overlay), never a process-global, so concurrent
workers share no mutable state.
- `run_test_source` keeps its signature; defaults jobs to min(tests, cpus).
- `run_test_source_with_jobs` exposes the explicit `--jobs`-style knob.
- New `runs_tests_in_parallel_shards` test forces 8 workers over 17 tests
and asserts cross-test isolation holds concurrently + ordering is stable.
docs/testing.md records the walkthrough decisions: asserts global namespace;
env as a value-carried overlay vs bazel as BazelBackend::{Real, Fake};
the bazel Fake design (generic fake-bazel process, socketpair control
channel, synthesized BES/execlog/stream surfaces, fork+exec not fork(),
reuse basil-core, no embedded-binary bloat); and the parallelism rules that
turn process-global shortcuts (BAZEL_REAL, fixed BES paths, the spawn
registry) into bugs to fix before the bazel Fake lands.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Move the in-memory env overlay off the production `RuntimeEnv` (`store.rs`) and onto the harness-constructed Starlark values, so the mock route is value-carried rather than ambient through `eval.extra`/`from_eval`. - `store.rs`: drop `Env::test_env` + `with_test_env`. `TestEnvMap` is now `Arc<Mutex<BTreeMap<…>>>` (was `Rc<RefCell<…>>`) so the values that carry it satisfy the `Send + Sync` bound frozen Starlark values require. `Env::from_eval` stays — production still reads cli_version / roots through it. - `std/env.rs`: `std.Env` gains `Option<TestEnvMap>`. `var`/`set_var`/ `remove_var`/`vars` read/write the overlay carried on `this` when present, else the real process env. No more `from_eval` overlay reads. - `std/mod.rs`: `Std` gains `Option<TestEnvMap>` and mints its `std.env` carrying that handle. `Std::new()` (None) for production. - `task_context.rs`: `TaskContext` gains `env_overlay`; `ctx.std` mints `Std` carrying it. `with_env_overlay` builder for the test runner. Frozen contexts (production-only) always hand out `Std::new()`. - `testing.rs`: `Test`/`TestEnv` carry the overlay `Rc`. The runner mints `t.env`, `t.std`, and `t.ctx` from one shared handle, so all three observe the same map. `eval.extra` now carries only the production `base_env` (for cli_version/roots), not the overlay. Per-test state lives on the value, never a process-global; each overlay is touched only on its own worker thread, so the mutex is never contended. `config_context`/`feature_context` updated to `Std::new()`. docs/testing.md: decision 6 + roadmap 1c marked done. cargo test -p axl-runtime testing:: — 4 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Implements increment 2 of the AXL native testing POC: `ctx.bazel` can be
backed by a `Fake` bazel that synthesizes a real BES stream from a
declared, typed fixture — so tests exercise the production
`ctx.bazel.build` read path against a controllable bazel.
- `bazel/backend.rs` (new): `BazelBackend::{Real, Fake{fake_bin,
expectation}}` carried ON the `Bazel` Starlark value (not `eval.extra`),
read via `read_backend`, so it is per-value and parallel-safe. `Fake`
builds the `Command` straight from the fake path — no `BAZEL_REAL`
global. A per-invocation `socketpair` `ControlChannel` (behind a trait
for a later Windows transport) inherits the read end into the child
(CLOEXEC cleared in `pre_exec`) and ships the length-delimited
`BazelExpectation` frame parent→child.
- `basil-core` (new crate): the reusable replay/synthesis guts extracted
from `basil`. `BazelExpectation` (prost message) + `BuildResult` enum;
`replay_expectation` synthesizes `BuildStarted` → `TargetComplete`* →
`BuildFinished` + exit code onto the real `--build_event_binary_file`.
Raw `events=` escape hatch passes pre-framed `BuildEvent`s through. The
legacy named scenarios move here verbatim so existing build.rs tests
keep passing.
- `basil`: now a thin argv/env front-end over `basil-core`. Generic
fixture mode (control fd present) reads the expectation off
`ASPECT_FAKE_BAZEL_FD`; named-scenario mode unchanged.
- `bazel/mod.rs` + `build.rs`: thread the backend through `Build::spawn`.
`Fake` skips the live `server_info()` probe and uses the child pid as
galvanize's `server_pid`. `multi_phase.rs`: production mints `Real`.
- `testing.rs`: `t.bazel.expect_build(*targets, result=, exit_code=)`
declares the per-test fixture (its own `Arc<Mutex>` cell, never a
global); `t.ctx` mints the `Fake` backend from it. New runner
`run_test_source_with_fake_bazel`. End-to-end test proves a declared
expectation flows over the socketpair and is read back through the real
BES path (events + exit code).
- nix gains the "socket" feature; basil-core wired into the Cargo and
bazel build graphs. docs/testing.md: decision 7 + roadmap item 2 done.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
CI's bazel build rejected the previous commit: enabling the `nix` "socket" feature changed the rules_rs crate-extension facts, making the committed `MODULE.bazel.lock` stale (`--lockfile_mode=error`). Repinning the lockfile isn't possible here, and the feature is avoidable: a Unix `socketpair(AF_UNIX, SOCK_STREAM)` is exactly `std::os::unix::net:: UnixStream::pair()`, which needs no extra nix feature. - `backend.rs`: build the control channel from `UnixStream::pair()` instead of `nix::sys::socket::socketpair`; hold the child end as a `UnixStream`. std sets `FD_CLOEXEC` on both ends, so the existing `pre_exec` fcntl that clears it on the inherited fd is now load-bearing (doc updated). `nix::libc` (re-exported unconditionally) still supplies the fcntl constants. - `axl-runtime/Cargo.toml`: revert nix features to `["fs", "signal"]`. No external crate is added or removed (Cargo.lock unchanged but for the expected `basil-core` entry), so the committed MODULE.bazel.lock stays valid. Verified locally: cargo check (offline) clean and `cargo test -p axl-runtime testing::` — 5 passed, including the e2e fake-bazel test (socketpair → fork+exec → synthesized BES read back). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
The bazel analysis of //:cli failed: `//crates/basil-core` depends on `//crates/axl-proto`, but axl-proto's curated visibility list didn't include basil-core. Add `//crates/basil-core:__pkg__` alongside the existing basil / axl-runtime / build-event-stream entries. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Consolidate onto one fake-bazel mechanism. The `engine::bazel::build` tests drove basil via the `BAZEL_REAL` process-global + `--scenario=<name>` argv; now they mint `ctx.bazel` with a `BazelBackend::Fake` carrying a typed `BazelExpectation` (same path the AXL test harness uses), so the named-scenario table and the global env var are no longer needed. - `MultiPhaseEval::with_bazel_backend`: optional backend override for the contexts it mints (production stays `Real`; tests pass `Fake`). - `crate::test`: `.with_fake_bazel()` / `.with_fake_bazel_expectation(...)` carry an expectation; `run_task` builds the `Fake` backend and threads it through `MultiPhaseEval`. Removed `install_basil()` and the `BAZEL_REAL` global (resolves the decision-8 parallelism hazard for these tests). build.rs snippets drop their `--scenario=` flags; the cache-evicted (bug-1060) test declares `BuildResult::CacheEvicted`. - `basil-core`: deleted the dead named-scenario surface (`scenario`, `write_scenario`, `Scenario`, `ExitBehavior`); the generic `BazelExpectation` synthesis path is the only one left. - `basil`: now a thin generic-only front-end (reads the expectation off the control fd, replays, exits). Dropped the `info`/`--scenario` verbs, `BASIL_SERVER_PID`, and its now-unused axl-proto/prost deps. Verified offline: cargo check clean; `engine::bazel::build::tests` (19) and `engine::testing` (5) pass. (bug_1060 is the pre-existing parallel-cold-build flake — deterministic green in isolation and once basil is built; CI pre-builds basil as a data dep.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
The `info::server_info()` / `client_pid` / `is_server_busy` / `server_pid_nonblocking` free functions all hardcoded `bazel_command()` (the real bazel) and couldn't see the backend, so every info-shaped call silently bypassed `BazelBackend::Fake`. Consolidate the fork onto the backend: a single `base_command(startup_flags)` primitive plus typed verb methods (`info`, `server_info`, `client_pid`, `is_server_busy`, `server_pid_nonblocking`). `info` is now the general key→value method and `server_info` is `info(["server_pid","release"])` + parse, dropping a duplicate parse loop. `info.rs` shrinks to pure parsing helpers (`parse_release`, `parse_info_map`). The version probe in `resolve_flags_for_running_bazel`, the public `ctx.bazel.info()`, `query`, `health_check`, and `cancel_invocation` now route through the backend (threaded through `Query` and `Cancellation`), so Real/Fake is decided in one place. The one genuine topology fact — `server_pid = child.id()` for the fake, which has no separate daemon — stays inside `Build::spawn`. Behavior-preserving: the fake path is unchanged (still build/test-only) until basil learns the new verbs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Replace the two `is_fake()` checks in `Build::spawn` with backend methods: `build_server_info` (Real probes the daemon; Fake returns `(0, None)` with no probe) and `bes_server_pid` (Real → daemon pid; Fake → the child it just spawned, since the fake has no separate daemon). The galvanize liveness topology now lives on the backend instead of leaking into the spawn path, and `is_fake()` is gone. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Drop internal jargon ("decision 6", eval.extra) from the Bazel.backend
doc comment left over from conflict resolution; the field is carried on
the value and untraced.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
5fc2c6b to
514ccb0
Compare
Expose the built-in parallel `*_test.axl` runner to Starlark and wire up an
`aspect axl test` command that drives it.
- Grant the first-party `@aspect` module the existing `#_is_std#` std-context
privilege (alongside `@std`/`@bazel`) so its files may reach `__builtins__`.
Third-party modules stay unprivileged.
- Add `__builtins__.testing().run(source)`, gated at the accessor (matching
`hash()`/`time()`), returning a summary dict
`{error, passed, failed, outcomes:[{name, passed, message}]}`. A module-level
parse/load failure is surfaced as `error` rather than raising, so one bad
file never aborts a run.
- Add the `aspect axl test` task: discovers `*_test.axl` files under the given
paths (defaulting to the workspace root; skips hidden dirs and does not
follow directory symlinks), runs each through the runner, reports per-test
results, and exits non-zero on any failure or file error.
The runner remains loader-free for now: a test file that `load(...)`s other
modules is reported as a file error.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A proof-of-concept for giving AXL a first-class, pytest-style testing story built into the engine — the result of a design discussion about a better testing story for AXL. It implements the load-bearing decisions end to end so we can review from a working artifact, then iterate.
Full design, roadmap, and a log of decisions I made without explicit sign-off are in
docs/testing.md.The shape
No per-test wiring in
config.axl, nopipeline.yamllist, no copied_eq, no_snapshot_env/_restore_env.What's implemented (all in
crates/axl-runtime)*_test.axlfiles evaluate against base AXL + a test-onlyexpectnamespace, selected by filename suffix in the loader (eval/load.rs,eval/api.rs::get_test_globals). The vocabulary exists only in test files — proven by a test thatexpectis absent from production globals.def test_*(t)functions; the runner enumeratestest_*callables (mirrorsFrozenTaskModuleLike::tasks()).t.t.env(in-memory env overlay),t.std, andt.ctx— a realTaskContext, the same Rust type production uses.t.ctx.std.envis the genuinestd.Envtype; it reads the in-memory overlay only because the runner installs atest_envoneval.extra(engine/store.rs,engine/std/env.rs). Contract stays identical to reality, enforced by the type system; internaldowncast_ref::<RealType>()keeps working.Verify
cargo test -p axl-runtime testing::Three passing tests: discovery + isolation + failure capture; the test-only globals split; and "overlay never leaks into the real process env".
Notable decisions made without sign-off (see docs for the full list)
expect, notassert—assertis a reserved keyword in the dialect and won't parse as an identifier. Alternatives:check, or harness methods (t.assert_eq).t).envbackend is mocked so far;io/fs/net/process/bazelfollow the identical pattern (roadmap in docs) but aren't in this slice.aspect testCLI task yet — the runner is a Rust function proven by tests; wiring it as a builtin AXL task (next toaxl_add.axl) via a sandbox-run primitive is the next step.Not caused by this PR
Two pre-existing
axl-runtimetest failures (bug_1060_…timing test;reports_still_poisoned_when_removal_fails, which expects a dir removal to fail but succeeds when running as root) fail on a clean checkout too — verified by stashing these changes.https://claude.ai/code/session_018yR9Wr4VoKxAawyKnhtP6B
Generated by Claude Code