Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
341976a
feat(mcp/autovisualiser): harden the pipeline and add 24 new visualiz…
Broccolito Jun 19, 2026
03ba5a2
feat(mcp): add Agent Drafter built-in extension, plus bundled working…
Broccolito Jun 19, 2026
52bba11
fix(mcp/autovisualiser): accept stringified `data` args, fix map sizi…
Broccolito Jun 19, 2026
efa8a46
fix(developer): accept 'file_path' as alias for text_editor 'path'
Broccolito Jun 19, 2026
4abb47d
fix(providers): deeper retry budget for transient rate-limit (429) er…
Broccolito Jun 19, 2026
a2566d7
feat(developer,hooks): git context in the developer extension + verif…
Broccolito Jun 19, 2026
ffa4332
fix(cli): graceful --resume fallback + readable tool-call paths
Broccolito Jun 19, 2026
d75abbc
fix(agent): make the per-turn action-limit stop explicit and quantified
Broccolito Jun 19, 2026
2f56a3d
qa: import biorouter-testing-apps QA suite into the project
Broccolito Jun 19, 2026
46a006d
qa: repoint build harness ROOT to in-project biorouter-testing-apps (…
Broccolito Jun 19, 2026
4428934
qa: snapshot apps 13-15 (phylo, variant-caller, kmer) + round-3 report
Broccolito Jun 19, 2026
4191c54
qa: snapshot bioinformatics batch apps 16-20 + round-4 report
Broccolito Jun 20, 2026
851b662
qa: snapshot med batch apps 21-25 + round-5 report
Broccolito Jun 20, 2026
ec10519
qa: complete biomedical batch (apps 26-30) + round-6 report + harness…
Broccolito Jun 20, 2026
5cbb45c
chore(release): bump to 1.85.4; bundle agent-drafter UI, ACP WebSocke…
Broccolito Jun 20, 2026
09d61e5
qa: statistics batch apps 31-37 + round-7 report (loop paused at user…
Broccolito Jun 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
36 changes: 36 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,42 @@ settings provider grid, `biorouter configure`).
(real server + tiny Qwen3.5 0.8B, ~0.5 GB one-time download):
`BIOROUTER_LLAMACPP_BIN=ui/desktop/src/bin/llamacpp/llama-server cargo test -p biorouter --test llamacpp_integration -- --ignored --test-threads=1`

### Auto Visualiser feature

The Auto Visualiser (`autovisualiser`) built-in MCP server turns structured data
into self-contained interactive HTML figures, returned as `ui://…` resources and
rendered inline in chat (sandboxed iframe via `@mcp-ui` + the `/mcp-ui-proxy`).

- **Module:** `crates/biorouter-mcp/src/autovisualiser/` β€” `mod.rs` (router +
the 8 original tools), `common.rs` (shared infra), `tools_extra.rs` (Mermaid
wrappers), `tools_charts.rs` (Chart.js), `tools_d3.rs` (D3), `tools_geo.rs`
(Leaflet), `tests.rs` + `tests_extra.rs`. The `tools_*.rs` files are
`include!`d into `mod.rs`; each defines a `#[tool_router(router = …)]` impl
block, combined in `new()` via `ToolRouter` `+`.
- **Shared pipeline (`common.rs`):** validate β†’ JSON-encode safely (`js_data`
neutralises `</script>` breakout) β†’ `assemble` template with `{{ASSETS}}` +
`{{COMMON}}` (the shared `templates/_common.js`: theme, palette, auto-resize,
global error card) β†’ base64 `ui://` blob (`finish`). Every tool also enforces
size limits + semantic checks and returns a friendly `INVALID_PARAMS` message
instead of producing a broken figure.
- **Tools (33):** charts (`show_chart`, `render_histogram`, `render_boxplot`,
`render_bubble`, `render_area`, `render_radar`, `render_donut`, `render_gauge`);
scientific (`render_volcano`, `render_manhattan`, `render_kaplan_meier`,
`render_forest`); relationships/hierarchies (`render_network`, `render_sankey`,
`render_chord`, `render_heatmap`, `render_treemap`, `render_sunburst`,
`render_dendrogram`, `render_wordcloud`, `render_calendar_heatmap`); diagrams
(`render_mermaid` + typed wrappers `render_flowchart`/`gantt`/`sequence`/
`mindmap`/`timeline`/`er_diagram`/`state_diagram`/`class_diagram`); geo
(`render_map`, `render_choropleth`).
- **Assets:** libraries (D3, Chart.js, Leaflet, Mermaid) are inlined by default
for offline use. `BIOROUTER_AUTOVIS_CDN=1` switches to pinned CDN tags, which
shrinks the persisted/reloaded blob from megabytes to a few KB (recommended if
large Mermaid diagrams fail to re-render on chat reopen).
`BIOROUTER_AUTOVIS_DEBUG=1` (or debug builds) dumps generated HTML to the app
cache dir (`<cache>/autovisualiser/<name>-<pid>.html`).
- **Tests:** `cargo test -p biorouter-mcp --lib autovisualiser` (happy paths,
edge cases, escaping, lenient enum parsing).

### Communication Flow

```
Expand Down
25 changes: 18 additions & 7 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
[workspace]
members = ["crates/*"]
members = ["biorouter-testing-apps/bio-blast-lite-rs","crates/*"]
resolver = "2"

[workspace.package]
edition = "2021"
version = "1.85.3"
version = "1.85.4"
authors = ["Block <ai-oss-tools@block.xyz>"]
license = "Apache-2.0"
repository = "https://github.com/BaranziniLab/BioRouter"
Expand Down
9 changes: 9 additions & 0 deletions biorouter-testing-apps/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Regenerable build artifacts from the per-app builds
*/target/
*/build/
*/.venv/
**/__pycache__/
*.log
.DS_Store
# User's separate dataset that happened to live here β€” not part of the QA apps
autovis-phase3/
157 changes: 157 additions & 0 deletions biorouter-testing-apps/CHECKLIST.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# BioRouter Build-100 Test Checklist

Goal: drive the **BioRouter CLI** (Xiaomi MiMo, `mimo-v2.5-pro`, developer + todo
only) to build 100 substantial software artifacts β€” each in its own git repo
under this directory β€” as a comprehensive end-to-end test of the agent system.

Scale target: each item should be a real artifact (multiple files, hundreds–
thousands of LOC), not a one-file script. Every repo is `git init`'d and commits
are tracked.

Status legend: ☐ todo Β· ◐ in progress Β· β˜‘ done Β· βœ— blocked (see FAILURE_LOG.md)

## Batch 1 β€” Algorithms & data structures (1–10)
1. ☐ `algo-pathfinding-rs` β€” A*/Dijkstra/BFS pathfinding lib + CLI maze solver (Rust)
2. ☐ `algo-sorting-visualizer-py` β€” sorting algorithms + animated terminal visualizer (Python)
3. ☐ `algo-bst-avl-redblack-cpp` β€” balanced BST family with tests (C++)
4. ☐ `algo-graph-toolkit-rs` β€” graph algorithms (SCC, MST, max-flow, topo) (Rust)
5. ☐ `algo-string-matching-py` β€” KMP/Boyer-Moore/Rabin-Karp/suffix-array (Python)
6. ☐ `algo-dynamic-programming-cpp` β€” classic DP problem set + benchmark harness (C++)
7. ☐ `algo-hash-table-impl-rs` β€” open-addressing + chaining hash maps w/ bench (Rust)
8. ☐ `algo-compression-lz77-huffman-py` β€” LZ77 + Huffman codec (Python)
9. ☐ `algo-bignum-arbitrary-precision-cpp` β€” arbitrary-precision integer library (C++)
10. ☐ `algo-bloom-cuckoo-filters-rs` β€” probabilistic filters with FPR analysis (Rust)

## Batch 2 β€” Bioinformatics (11–20)
11. ☐ `bio-seq-alignment-py` β€” Needleman-Wunsch + Smith-Waterman aligner (Python)
12. ☐ `bio-fasta-fastq-toolkit-rs` β€” FASTA/FASTQ parser, stats, QC tool (Rust)
13. ☐ `bio-phylo-tree-builder-py` β€” neighbor-joining / UPGMA phylogenetics (Python)
14. ☐ `bio-variant-caller-pipeline-py` β€” pileup β†’ variant calling pipeline (Python)
15. ☐ `bio-kmer-counter-cpp` β€” k-mer counting + de Bruijn graph (C++)
16. ☐ `bio-gene-expression-r` β€” RNA-seq differential expression analysis (R)
17. ☐ `bio-protein-structure-py` β€” PDB parser + secondary-structure metrics (Python)
18. ☐ `bio-blast-lite-rs` β€” seed-and-extend local alignment search (Rust)
19. ☐ `bio-genome-assembly-py` β€” overlap-layout-consensus mini-assembler (Python)
20. ☐ `bio-motif-finder-py` β€” Gibbs sampling / MEME-style motif discovery (Python)

## Batch 3 β€” Biomedical informatics (21–30)
21. ☐ `med-ehr-fhir-parser-py` β€” FHIR resource parser + patient timeline (Python)
22. ☐ `med-icd-snomed-mapper-py` β€” clinical terminology crosswalk service (Python)
23. ☐ `med-survival-analysis-r` β€” Kaplan-Meier + Cox PH modeling (R)
24. ☐ `med-clinical-trial-sim-py` β€” adaptive trial design simulator (Python)
25. ☐ `med-drug-interaction-graph-rs` β€” drug-drug interaction graph engine (Rust)
26. ☐ `med-dicom-image-tool-py` β€” DICOM reader + windowing/segmentation (Python)
27. ☐ `med-risk-score-calculator-py` β€” composable clinical risk scores API (Python)
28. ☐ `med-cohort-builder-sql-py` β€” cohort query builder over synthetic EHR (Python)
29. ☐ `med-biomarker-discovery-r` β€” feature selection for biomarker panels (R)
30. ☐ `med-epidemic-seir-model-py` β€” SEIR/agent-based epidemic simulator (Python)

## Batch 4 β€” Statistics & data analysis (31–45)
31. ☐ `stat-bayesian-mcmc-py` β€” Metropolis-Hastings / Gibbs sampler library (Python)
32. ☐ `stat-glm-from-scratch-r` β€” generalized linear models implementation (R)
33. ☐ `stat-timeseries-arima-py` β€” ARIMA/Holt-Winters forecasting toolkit (Python)
34. ☐ `stat-hypothesis-testing-suite-r` β€” comprehensive test battery + reporting (R)
35. ☐ `stat-bootstrap-resampling-py` β€” bootstrap/jackknife/permutation engine (Python)
36. ☐ `stat-pca-dimreduction-cpp` β€” PCA/t-SNE/UMAP-lite numerics (C++)
37. ☐ `data-etl-pipeline-py` β€” configurable ETL pipeline w/ validation (Python)
38. ☐ `data-csv-query-engine-rs` β€” columnar CSV query engine (Rust)
39. ☐ `data-dashboard-generator-py` β€” static analytics dashboard builder (Python)
40. ☐ `data-stream-aggregator-rs` β€” streaming windowed aggregations (Rust)
41. ☐ `stat-survival-power-r` β€” power analysis + sample size calculator (R)
42. ☐ `stat-mixed-models-r` β€” linear mixed-effects modeling (R)
43. ☐ `data-anomaly-detection-py` β€” multivariate anomaly detection toolkit (Python)
44. ☐ `data-feature-store-py` β€” feature engineering + store with lineage (Python)
45. ☐ `stat-causal-inference-py` β€” propensity scoring / IPW / DiD (Python)

## Batch 5 β€” Machine learning & numerical (46–55)
46. ☐ `ml-neural-net-from-scratch-py` β€” MLP w/ autograd, no frameworks (Python)
47. ☐ `ml-decision-tree-forest-rs` β€” decision tree + random forest (Rust)
48. ☐ `ml-linear-models-cpp` β€” linear/logistic regression w/ SGD (C++)
49. ☐ `ml-kmeans-clustering-py` β€” clustering suite (k-means/DBSCAN/hierarchical) (Python)
50. ☐ `ml-recommender-system-py` β€” collaborative filtering + matrix factorization (Python)
51. ☐ `ml-gradient-boosting-py` β€” gradient-boosted trees implementation (Python)
52. ☐ `ml-nlp-text-classifier-py` β€” TF-IDF + naive Bayes/SVM pipeline (Python)
53. ☐ `num-linear-algebra-rs` β€” matrix ops, LU/QR/SVD decompositions (Rust)
54. ☐ `num-ode-solver-cpp` β€” Runge-Kutta/adaptive ODE integrators (C++)
55. ☐ `num-fft-signal-py` β€” FFT + DSP filtering toolkit (Python)

## Batch 6 β€” Games (56–65)
56. ☐ `game-snake-rs` β€” terminal Snake with AI autoplayer (Rust)
57. ☐ `game-snake-py` β€” pygame Snake variant + level editor (Python)
58. ☐ `game-tetris-cpp` β€” terminal Tetris with scoring/levels (C++)
59. ☐ `game-2048-rs` β€” 2048 with solver + undo (Rust)
60. ☐ `game-conway-life-py` β€” Game of Life w/ patterns + RLE loader (Python)
61. ☐ `game-chess-engine-cpp` β€” chess engine w/ minimax + alpha-beta (C++)
62. ☐ `game-minesweeper-py` β€” Minesweeper w/ solver/probability hints (Python)
63. ☐ `game-roguelike-rs` β€” procedural dungeon roguelike (Rust)
64. ☐ `game-sudoku-solver-generator-py` β€” Sudoku generator + backtracking solver (Python)
65. ☐ `game-pong-ai-py` β€” Pong with reinforcement-learning paddle (Python)

## Batch 7 β€” Complex software engineering (66–80)
66. ☐ `swe-key-value-store-rs` β€” LSM-tree embedded KV store w/ WAL (Rust)
67. ☐ `swe-http-server-cpp` β€” epoll/kqueue HTTP/1.1 server (C++)
68. ☐ `swe-json-parser-rs` β€” spec-compliant JSON parser + serializer (Rust)
69. ☐ `swe-regex-engine-py` β€” NFA/DFA regex engine (Python)
70. ☐ `swe-task-queue-py` β€” distributed task queue w/ workers (Python)
71. ☐ `swe-mini-interpreter-rs` β€” Lox-like scripting language interpreter (Rust)
72. ☐ `swe-orm-lite-py` β€” lightweight ORM over SQLite (Python)
73. ☐ `swe-template-engine-py` β€” Jinja-like template engine (Python)
74. ☐ `swe-rpc-framework-rs` β€” length-prefixed RPC framework (Rust)
75. ☐ `swe-static-site-generator-py` β€” Markdown static site generator (Python)
76. ☐ `swe-bytecode-vm-cpp` β€” stack-based bytecode VM (C++)
77. ☐ `swe-graphql-server-py` β€” schema-driven GraphQL server (Python)
78. ☐ `swe-build-system-rs` β€” dependency-graph build tool (Rust)
79. ☐ `swe-container-runtime-py` β€” namespace/cgroup mini container runtime (Python)
80. ☐ `swe-distributed-kv-raft-rs` β€” Raft consensus KV cluster (Rust)

## Batch 8 β€” Large/multi-module projects (81–90)
81. ☐ `proj-markdown-ide-py` β€” full markdown editor TUI w/ plugins (Python)
82. ☐ `proj-data-viz-library-py` β€” plotting library w/ multiple backends (Python)
83. ☐ `proj-web-crawler-rs` β€” concurrent crawler + indexer (Rust)
84. ☐ `proj-time-series-db-rs` β€” embeddable time-series database (Rust)
85. ☐ `proj-spreadsheet-engine-cpp` β€” formula-evaluating spreadsheet engine (C++)
86. ☐ `proj-package-manager-py` β€” dependency resolver + package manager (Python)
87. ☐ `proj-ci-runner-py` β€” YAML-driven CI pipeline runner (Python)
88. ☐ `proj-genomics-workflow-py` β€” multi-stage genomics workflow engine (Python)
89. ☐ `proj-text-search-engine-rs` β€” inverted-index full-text search w/ BM25 (Rust)
90. ☐ `proj-trading-backtester-py` β€” event-driven strategy backtester (Python)

## Batch 9 β€” Mixed advanced / cross-domain (91–100)
91. ☐ `adv-image-processing-cpp` β€” convolution/edge/morphology image lib (C++)
92. ☐ `adv-ray-tracer-rs` β€” path-tracing renderer (Rust)
93. ☐ `adv-physics-engine-py` β€” 2D rigid-body physics engine (Python)
94. ☐ `adv-audio-synth-py` β€” modular audio synthesizer + sequencer (Python)
95. ☐ `adv-network-protocol-rs` β€” reliable protocol over UDP (Rust)
96. ☐ `adv-compiler-frontend-cpp` β€” lexer/parser/AST/typechecker for a C subset (C++)
97. ☐ `adv-blockchain-py` β€” proof-of-work blockchain + P2P mempool (Python)
98. ☐ `adv-graph-database-rs` β€” property graph DB w/ traversal query lang (Rust)
99. ☐ `adv-scientific-pipeline-r` β€” reproducible multi-stage analysis (R)
100. ☐ `adv-quantum-circuit-sim-py` β€” quantum circuit state-vector simulator (Python)

---
Languages covered: Rust (28), Python (52), C++ (14), R (8) β€” every batch mixes languages.
Each build is driven through `biorouter run`/`session` (Xiaomi MiMo) and committed to git.

## Interaction Protocol (each app is INTERACTIVE, not one-shot)

Every app goes through an **initial build** (`build_app.sh`, named resumable
session) followed by **2–4 follow-up refinement turns** (`interact.sh --resume`)
in which the Claude harness drives the BioRouter agent like a real user iterating
on their project. Each app draws its follow-ups from this menu (varied across the
100 so every interaction style is exercised):

- **A. Add a feature** β€” "now add <capability X> and wire it into the CLI/tests."
- **B. Change a requirement mid-stream** β€” "actually the input format should be Y, refactor accordingly."
- **C. Fix / debug** β€” "running `<cmd>` gives `<error>`; diagnose and fix it." (sometimes inject a real bug first)
- **D. Refactor / restructure** β€” "split module Z, extract a trait/interface, reduce duplication."
- **E. Improve output aesthetics** β€” "make the CLI output prettier: colors, aligned tables, a summary line."
- **F. Add tests / coverage** β€” "add edge-case tests for <component> and make them pass."
- **G. Add docs / examples** β€” "write a usage example and expand the README with a diagram."
- **H. Performance** β€” "benchmark and optimize the hot path; report before/after."
- **I. Productionize** β€” "add error handling, input validation, and a config file."
- **J. Explain & verify** β€” "summarize the architecture and prove the tests cover the main paths."

Each turn is committed separately so the iteration history is visible in git.
Both *functional* outcomes (did it work?) and *experiential* ones (how did the
CLI handle the request, call tools, and present results?) are scored in
`UX_BENCHMARK.md`.
Loading
Loading