Skip to content

feat(server): load_project_stat_klasses + project_root on /load_expr#784

Open
paddymul wants to merge 2 commits into
mainfrom
feat/project-stats-load-expr
Open

feat(server): load_project_stat_klasses + project_root on /load_expr#784
paddymul wants to merge 2 commits into
mainfrom
feat/project-stats-load-expr

Conversation

@paddymul
Copy link
Copy Markdown
Collaborator

@paddymul paddymul commented May 20, 2026

Summary

Adds an optional project_root field to POST /load_expr. When set, the server scans <project_root>/stats/*.py for compute(col) -> ibis_expr functions, decorates each with @stat() keyed by the filename stem, and folds them into the session's analysis_klasses alongside XORQ_STATS_V2. No new wire shape, no separate registration call — host-authored summary stats appear in merged_sd on the next session load.

Surfaced by pydata-app — design plan: plans/llm-summary-stats.md. The LLM writes one xorq expression per stat into the project directory; this PR lets the buckaroo server pick them up at session-load time.

Sits on top of #776 (/load_expr endpoint). Independent of #780 / #781 / #783 — the scoped-summary-stats and reentry-guard work doesn't affect the project-stats loader's surface.

Wire surface

 POST /load_expr
 {
   "build_dir": "...",
   "session": "...",
   "no_browser": true,
+  "project_root": "/abs/path/to/project"   // optional
 }

When unset, behaviour is unchanged (built-in stats only).

How it works

xorq_loading.load_project_stat_klasses(project_root):

  • Walks <project_root>/stats/*.py (_-prefixed files skipped as a parking convention).
  • For each file, execs the source in a restricted-globals namespace (__builtins__ reduced to a 25-name allowlist; ibis / xorq.api exposed for expression construction). No __import__, no open, no eval. Not a real sandbox — the function can still walk col.__class__.__mro__ etc. — but it closes the obvious escape vectors and the project owner is trusted (the buckaroo subprocess only scans paths a host explicitly passed).
  • Pulls the compute function out of the resulting namespace, injects a XorqColumn annotation on its single parameter so the @stat() decorator marks it as needing raw column data, renames it to the file stem, decorates it.
  • Errors in any one file are logged and the file is skipped. One bad stat doesn't block the rest.

XorqServerDataflow gains an extra_klasses kwarg that overrides analysis_klasses at the per-instance level (class-level _XORQ_ANALYSIS_KLASSES untouched, so other sessions / direct widget usage don't inherit one project's stats).

LoadExprHandler.post reads project_root from the body, calls the helper, passes the result through as extra_klasses.

Why this shape

  • No new endpoint. The agent that authors a stat writes a file to disk under the project directory. /load_expr is the only path that needs to know about it. Hot-reload of running sessions is explicitly V2.
  • No DSL. compute(col) returns an ibis expression. The contract is identical to the built-in XORQ_STATS_V2 entries, just with the function body coming from a project file instead of a buckaroo module.
  • Push-down preserved. XorqStatPipeline doesn't distinguish project stats from built-ins; compile_batch_expr folds everything into one aggregate query.
  • Built-ins win on collision. They're appended first; pipeline-side dedupe resolves to the first occurrence.

Test plan

  • 9 unit tests in tests/unit/server/test_project_stats.py covering: empty/missing dir, filename-stem keying, no-compute rejection, import os rejection (restricted builtins), _-prefix skip, real ibis column execution, per-instance extra_klasses extension, and the no-extra_klasses fallback.
  • Full tests/unit/server/ + basic_widget_test.py suite green locally on main (79 passed, 1 skipped).
  • End-to-end through /load_expr body (left for pydata-app's integration test — the wire field is a single body.get("project_root") so unit tests on the helper + the dataflow are the load-bearing pieces).

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a952e7327

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +166 to +169
ns: dict = {}
exec(compile(source, str(path), "exec"), globs, ns)

compute = ns.get("compute")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Execute stat modules in a single namespace

Running exec with separate globs and ns makes compute capture globs as its global scope, so names defined in the same stat file (e.g., THRESHOLD = ... or def helper(...)) are stored in ns and are not visible when compute runs. In practice, files that define compute(col) plus any module-level helper/constant will load successfully here but then fail with NameError during stat execution for every column. Use one shared namespace (or merge locals back into globals) so project stats behave like normal Python modules.

Useful? React with 👍 / 👎.

paddymul added 2 commits May 20, 2026 18:10
7 tests pinning the project-stats loader contract:
- empty / missing stats dir -> []
- one file -> one wrapped @stat func keyed by filename
- file without compute() skipped (not raised)
- ``import os`` blocked by restricted globals (file skipped, others load)
- _-prefixed files ignored (parked / disabled stats)
- the wrapped func executes against an ibis column and returns a
  scalar expression — the contract every XORQ_STATS_V2 entry satisfies

Collection fails today because the helper doesn't exist — TDD bait.
Lets hosts (e.g. pydata-app's MCP server) hand buckaroo runtime-authored
summary stats without restarting the server or adding new wire shapes.

Wire surface — one new optional field on /load_expr's body:

    POST /load_expr
    {
      "build_dir": "...",
      "project_root": "<abs path>",   # NEW, optional
      ...
    }

Server side, when `project_root` is set, scans `<project_root>/stats/*.py`
for files defining `compute(col) -> ibis_expr`, exec's each in a
restricted-globals namespace (no `__import__`, no `open`, no `eval`),
renames the function to the file's stem, and decorates it with `@stat()`.
The result is a list of `XORQ_STATS_V2`-shaped wrapped functions that
get folded into `XorqServerDataflow.analysis_klasses` at the per-instance
level (class-level list untouched).

The restricted globals close the function body's reachable surface
without a real sandbox — acceptable because the project owner is
trusted (only directories a host explicitly passes get scanned).

XorqStatPipeline doesn't distinguish project stats from built-ins, so
they batch into the same aggregate query via `compile_batch_expr` and
preserve push-down. Built-ins are kept first; if a project stat ever
collides on a key, the built-in wins.

Files whose name starts with `_` are skipped (convention for parking
disabled stats without removing them from the project tree). Errors
in any one file are logged and that file is skipped — one bad stat
shouldn't block the rest.

9 unit tests in tests/unit/server/test_project_stats.py exercise:
the empty/missing dir paths, filename-stem keying, the "no compute"
and "tried to import" rejection branches, the underscore-prefix skip,
an actual column-expression execution, and the per-instance
extra_klasses extension on XorqServerDataflow.
@paddymul paddymul changed the base branch from smorgasbord/scoped-sd to main May 20, 2026 22:12
@paddymul paddymul force-pushed the feat/project-stats-load-expr branch from 1a952e7 to 8071cfb Compare May 20, 2026 22:12
@github-actions
Copy link
Copy Markdown
Contributor

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.3.dev26192985207

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.14.3.dev26192985207

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.14.3.dev26192985207" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant