Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
0e2c412
Local-VLM visual review + large-scale performance sweep (#50)
mvalancy Jun 14, 2026
5dbe765
Self-heal the dev DB around heavy test suites (clean systems) (#51)
mvalancy Jun 14, 2026
3ade162
Add graph-geometry diagnostic (baseline for the node/edge/label overh…
mvalancy Jun 14, 2026
2f1673e
Edges attach to node borders + edge-label width is a hard minimum edg…
mvalancy Jun 14, 2026
22800a1
Drag-time clamp: a node can't be dragged closer than its edge-label m…
mvalancy Jun 14, 2026
51945a5
Fix CodeCaptcha refresh button not vertically centered in math mode (…
mvalancy Jun 14, 2026
a86b374
Tighten min edge length to label width + small margin (no oversized b…
mvalancy Jun 14, 2026
6666acd
Fix relationship type/flip live update + orthogonal Welcome graph; ad…
mvalancy Jun 14, 2026
780f71d
Integrate insecure-connection (HTTP) warning as a clean top strip, no…
mvalancy Jun 14, 2026
583ea3f
Add WorkItem->Graph drill-in relationship (hierarchical graphs, PR1) …
mvalancy Jun 14, 2026
cfec582
Hierarchical graphs PR2+3: guest-visible demo hierarchy + drill-in na…
mvalancy Jun 14, 2026
042de18
Visual treatment for sheet-symbol nodes (hierarchical graphs, PR4) (#61)
mvalancy Jun 14, 2026
63f3bb5
Graph PR-A: one-shot physics (settle then stop), spread-start layout,…
mvalancy Jun 14, 2026
ee97656
Seed demo hierarchy with clean (non-overlapping) spacing + force rese…
mvalancy Jun 14, 2026
6a255ec
Minimap wheel + pinch zoom (PR-C) (#64)
mvalancy Jun 14, 2026
5eea7cb
Expandable project-explorer graph tree (PR-D) (#65)
mvalancy Jun 14, 2026
89b0a3a
Add NodeContentRenderer: readable markdown + syntax-highlighted code …
mvalancy Jun 14, 2026
2ed0012
Docked node inspector: Card / Contents / Diagram, decoupled from zoom…
mvalancy Jun 14, 2026
dfd1d62
Perf S1: size-gate continuous node/edge effects on dense graphs (idle…
mvalancy Jun 15, 2026
19e7bec
Perf S2: viewport culling (zoomed-in) + throttled minimap (drag FPS) …
mvalancy Jun 15, 2026
b872df3
Perf S3: simplified-node LOD when zoomed out (whole-graph zoom FPS 3.…
mvalancy Jun 15, 2026
4623f23
Perf S5: extreme-zoom "dot mode" — hide edges, halve painted elements…
mvalancy Jun 15, 2026
aad5179
Fix core interaction bugs: orphan/duplicate arrows, drag-follow edit …
mvalancy Jun 16, 2026
0cd009e
Fix #30: node type change now updates the card in graph view (color/b…
mvalancy Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .env.test.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# GraphDone test-pipeline configuration — LOCAL ONLY.
#
# Copy this file to `.env.test.local` (which is gitignored) and fill in your
# own values. NEVER put real hostnames, IPs, or keys in this committed example
# or anywhere else in the repo — the local VLM boxes (GPU workstations) must
# stay out of version control. The test harness auto-loads `.env.test.local`.
#
# cp .env.test.example .env.test.local # then edit .env.test.local

# --- Local Vision-Language-Model (VLM) endpoints ---------------------------
# Comma-separated base URLs of your local VLM server(s). Requests are
# round-robined across them so visual evaluation is spread over every GPU.
# Leave blank to skip all VLM-driven suites (they no-op cleanly in CI).
# Example shape (use your OWN hosts in .env.test.local, never here):
# VLM_ENDPOINTS=http://<gpu-host-a>:<port>,http://<gpu-host-b>:<port>
VLM_ENDPOINTS=

# Model id/tag to request (e.g. a llava / qwen2-vl / llama-3.2-vision build).
VLM_MODEL=

# Optional bearer key for OpenAI-compatible servers that require one.
VLM_API_KEY=

# Wire protocol: auto (default) | openai | ollama.
# auto — probe each endpoint: /v1/models => OpenAI-compatible, else Ollama.
# openai — POST /v1/chat/completions (vLLM, LM Studio, llama.cpp, Ollama compat)
# ollama — POST /api/chat with an images[] array
VLM_PROTOCOL=auto

# Max concurrent VLM requests across all endpoints (default 3).
VLM_MAX_CONCURRENCY=3

# Per-request timeout in ms — VLMs can be slow on large images (default 120000).
VLM_TIMEOUT_MS=120000

# --- Large-scale performance sweep -----------------------------------------
# Node counts to sweep, comma-separated. Leave blank to use the built-in
# default (small in CI, large locally). Example: 50,200,500,1000,2000
SCALE_SWEEP_SIZES=

# Quality tiers to sweep per size (subset of LOW,MEDIUM,HIGH,ULTRA).
SCALE_SWEEP_QUALITIES=HIGH,ULTRA
2 changes: 2 additions & 0 deletions docs/SYSTEMS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
| Lint | `npm run lint` | 0 errors (warnings allowed) |
| Build | `npm run build` | Production build succeeds |
| Showcase report | `TEST_URL=http://localhost:3127 npm run report:showcase` | Records .webm video + screenshots of every mode at all 5 resolutions → `test-artifacts/showcase/index.html` (also an every-PR CI artifact). |
| Large-scale perf sweep | `TEST_URL=http://localhost:3127 npm run test:perf:scale` | Seeds graphs of increasing size (50→2000+ nodes) and records `window.__graphPerf` (settle, tick, fps, drift, query p95) across size × quality → `test-artifacts/scale-sweep/index.html`. Report-only; sizes/qualities via `.env.test.local`. See [docs/testing/local-vlm-and-scale.md](./testing/local-vlm-and-scale.md). |
| Local VLM visual review | `TEST_URL=http://localhost:3127 npm run test:vlm` | A locally-hosted vision model judges captured states from 4 perspectives (visual defects, new-user clarity, accessibility, living-graph aliveness) → `test-artifacts/vlm/index.html`. **Skips unless `VLM_ENDPOINTS` is set in the gitignored `.env.test.local`** (CI can't reach local GPUs). Report-only. |

**Why THE GATE exists:** a real incident — orphaned `Edge` records made the
edges query 500 and the UI showed "Error" with zero edges, while every unit
Expand Down
113 changes: 113 additions & 0 deletions docs/testing/local-vlm-and-scale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Local VLM visual review & large-scale performance sweeps

Two heavier, report-only suites that exercise GraphDone from realistic user
perspectives and at scale. Both are **opt-in and run locally** (or on a
self-hosted runner) because the vision models live on your own GPU boxes —
their hostnames must never enter the repo.

## TL;DR

```bash
cp .env.test.example .env.test.local # gitignored — put your real values here
# edit .env.test.local: VLM_ENDPOINTS, VLM_MODEL, (optional) sweep sizes

./start dev # or have the stack running on :3127

TEST_URL=http://localhost:3127 npm run test:perf:scale # → test-artifacts/scale-sweep/index.html
TEST_URL=http://localhost:3127 npm run test:vlm # → test-artifacts/vlm/index.html
```

If `VLM_ENDPOINTS` is unset, `test:vlm` **skips cleanly** — so CI and other
developers are never blocked by hardware they don't have.

## Keeping hostnames out of the repo

- **Never** commit hostnames, IPs, or keys. The GPU boxes (e.g. an RTX 4090
workstation and Grace-Blackwell nodes) are referenced only by env vars.
- `.env.test.local` is gitignored (see `.gitignore`). It is the *only* place
your real endpoints live.
- `.env.test.example` is committed and documents the variable **names** with
placeholder hosts (`http://<gpu-host>:<port>`). Copy it to `.env.test.local`
and fill in the rest.
- The harness auto-loads `.env.test.local` via `tests/helpers/testEnv.ts`.

```bash
# .env.test.local (NOT committed)
VLM_ENDPOINTS=http://<host-a>:<port>,http://<host-b>:<port>,http://<host-c>:<port>
VLM_MODEL=<your-vision-model-tag>
VLM_PROTOCOL=auto # auto | openai | ollama
VLM_MAX_CONCURRENCY=3
```

Multiple endpoints are **round-robined**, so visual evaluation spreads across
every GPU you list.

## VLM protocol support

`tests/helpers/vlm.ts` is protocol-agnostic and auto-detects per endpoint:

| Protocol | Detected via | Request |
|----------|--------------|---------|
| OpenAI-compatible | `GET /v1/models` | `POST /v1/chat/completions` with an `image_url` data URI (vLLM, LM Studio, llama.cpp server, Ollama's `/v1` shim) |
| Ollama native | `GET /api/tags` | `POST /api/chat` with a base64 `images[]` array |

Force one with `VLM_PROTOCOL=openai` or `ollama`. Each model call asks for a
strict JSON verdict `{pass, score, issues[], summary}`, parsed leniently.

### Personas

Each captured screenshot is judged from several perspectives (see `PERSONAS`
in `tests/helpers/vlm.ts`):

- **Visual defects** — overlapping/cut-off nodes, unreadable labels, broken
layout, missing edges, error chrome.
- **New-user clarity** — is the screen legible and inviting to a newcomer?
- **Accessibility** — contrast, text size, color-only signals, target size.
- **Living-graph aliveness** — do glow/breathe/flow status cues read clearly?

Report-only: a **FLAG** is the model's subjective concern, surfaced for a human
to look at — it never fails the build. The suite *does* assert the model
answered, so a broken client is still caught.

## Large-scale perf sweep

`tests/perf/scale-sweep.spec.ts` seeds real graphs (via the GraphQL API, the
same path a human/AI uses) of increasing size, loads each at one or more
quality tiers, and records the in-app `window.__graphPerf` readings plus load
time, settle time and query latency.

```bash
# .env.test.local
SCALE_SWEEP_SIZES=50,200,500,1000,2000 # blank => small in CI, large locally
SCALE_SWEEP_QUALITIES=HIGH,ULTRA
```

Metrics per (size, quality):

- **Reliable (measured directly from the browser, captured at every size):**
rendered node/edge counts, initial load ms, graph-scoped query p95, and
**interaction FPS** — real rendered frames/sec while a node is dragged
(counted via `requestAnimationFrame`, so it needs no app instrumentation and
reflects how janky the graph feels under interaction at scale).
- **Best-effort bonus (from the app's `window.__graphPerf`, which only
publishes ~every 2s while the sim ticks):** settle ms (to `alpha ≤ 0.02`),
avg/p95 sim tick ms, layout drift (`rmsFromSavedPx`). These can be blank for
graphs that settle instantly — `interactionFps` is the headline signal.

A FRESH graph is seeded per (size, quality) so each measurement starts from an
unsettled layout (otherwise the second quality loads the first run's settled,
pinned positions and the sim never ticks). Output:
`test-artifacts/scale-sweep/index.html` — a table plus inline SVG charts of how
each metric scales, with the `@perf` budgets drawn for reference.

Report-only; the only hard assertion is that a seeded graph actually renders.
Each seeded graph is deleted afterward (edges first, then nodes, then graph).

## CI

GitHub-hosted runners can't reach your local GPUs, so neither suite gates
merges there. To gate on them, register a **self-hosted runner** on a machine
that can reach the endpoints, give it the `.env.test.local`, and add a workflow
job (manual-dispatch or nightly) that runs `npm run test:perf:scale` /
`npm run test:vlm`. The scale sweep alone (no VLM) is safe to run on any runner
with the dev stack and a small `SCALE_SWEEP_SIZES`.
Loading
Loading