Skip to content

feat_ubi_llm_study_comparison: Side-by-side UBI-vs-LLM study comparison view #405

@SoundMindsAI

Description

@SoundMindsAI

Why

  • Problem: A demo operator who wants to compare the UBI-derived study against the LLM-derived study on the same scenario must open both study-detail pages in two browser tabs and mentally diff the digest narratives, the best-trial parameter values, the best-metric scalar, and the convergence curves. The central value proposition of the synthetic-UBI demo — "see what changes when you ground judgments in real behavior instead of an LLM's rubric reading" — is buried behind manual cross-tab labor.
  • Outcome: A single dedicated route /studies/compare?a={id}&b={id} renders the two studies side-by-side with a per-panel diff column: a sentence-level digest-narrative diff, a best-trial parameter table with same/different flags, a best-metric scalar with delta annotation (confidence-aware), and a two-series convergence-curve overlay. Entry points appear on the LLM study-detail page, the UBI study-detail page, and the UBI judgment-list value-delta card — but only when a valid paired study actually exists.
  • Non-goal: This feature does not seed data, does not generate judgments, does not modify any study/trial/digest, and does not add a rung badge to the cluster-detail page (that work is split out — see §3 and chore_cluster_detail_rung_badge). It is a read-only presentation layer over existing endpoints.

Status

  • Stage: PLAN
  • Priority: (see idea file)

Definition of done

Spec defines AC-1: Valid pair validates, AC-1b: Cross-cluster pair validates with warning (not a 422), AC-1c: Differing-target pair validates with TARGET_MISMATCH warning, AC-2: Two LLM studies rejected, AC-3: Different query sets rejected, AC-4: Not-completed rejected, AC-5: Missing study, AC-6: Pair discovery returns counterpart, AC-7: No counterpart, AC-8: Route ordering — no path-param shadow, AC-9: Entry button hidden when no pair, AC-10: Entry button shows + navigates with canonical ordering, AC-11: Digest narrative diff renders sentence-level changes, AC-12: Best-metric delta respects direction, AC-13: Convergence overlay consumes shipped curve when present, AC-14: Convergence overlay falls back to trials when curve absent, AC-15: Narrow-viewport stacked layout, AC-16: Parameter diff flags identical vs different, AC-17: Value-delta affordance visibility + navigation, AC-18: Reversed-URL column normalization, AC-19: Tutorial subsection present. Each must pass before merge.

Artifacts

How to execute

The folder has both feature_spec.md and implementation_plan.md — both cross-model reviewed. Ready to ship:

/impl-execute docs/00_overview/planned_features/02_mvp2/feat_ubi_llm_study_comparison/implementation_plan.md --all

--all runs the full story sequence end-to-end with per-story verification gates, phase-gate cross-model reviews via GPT-5.5, test coverage audit, push, PR creation, CI watch, Gemini Code Assist adjudication, and final cross-model review. The PR is opened but NOT merged — you merge it manually after review.

Notes

This issue is part of the MVP2 backlog issue-coverage sweep (2026-06-02) — every active MVP2 folder should have a tracking issue so external contributors can discover the work without grep-ing the planned-features tree. If you pick this up, drop a comment so others don't duplicate; if you find the linked idea/spec stale, run /idea-preflight first to refresh it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    mvp2MVP2 backlog itemready-to-executeHas approved spec + impl plan; ready for /impl-executetype/featureFeature — new product capability

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions