fix(pickled): correct toolbar scenario, add 3 external legibility scenarios#3595
fix(pickled): correct toolbar scenario, add 3 external legibility scenarios#3595caio-pizzol wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9741e3165b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| expected: | ||
| paths: | ||
| - "superdoc/ui/react" |
There was a problem hiding this comment.
Require the legacy scenario to assert rejection
For this new scenario, the only scoring condition is the presence of superdoc/ui/react, so an answer that incorrectly says createHeadlessToolbar/activeEditor.commands is the right approach but happens to mention the modern import path will still pass. Since the prompt is specifically meant to detect whether agents reject the legacy recommendation, this can mark the exact failure mode as successful and skew the Pickled results; add a positive assertion that the answer says the legacy approach is not recommended/should be avoided, or otherwise distinguish rejection from endorsement.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
No issues found across 1 file
Tip: cubic could auto-approve low-risk PRs like this, if it thinks it's safe to merge. Learn more
Re-trigger cubic
|
Closing in favor of #3601, which removes the root pickled.yml. Pickled's config schema changed in the latest CLI (new product/sources/agents/access/questions/checks model; traps removed), so these scenario corrections would land on a schema that no longer loads. The suite will be reintroduced deliberately later, and this pass's learnings fold into that. |
Follow-up to #3494. Fixes a scoring bug in the merged toolbar scenario and grows the external suite from 1 to 4 scenarios.
The toolbar scenario shipped with two false-negatives, both confirmed against a real run:
expected.excludesbannedcreateHeadlessToolbareven though the prompt asks "what should I avoid?" (so a correct answer naming it as the thing to avoid failed), and it requireduseSuperDocUI, which is a real export but absent from the docs bundle (so docs-grounded answers failed). Now scored positives-only:SuperDocUIProvider+superdoc/ui/react.Three new scenarios, each with terms verified present in
docs.superdoc.dev/llms-full.txtbefore locking:Document API+editor.doc)Yjs+modules.collaboration)createHeadlessToolbarright for new React UI? (positive check onsuperdoc/ui/react, no traps)Two candidates were dropped because the docs bundle contains no crisp deterministic term for them (built-in toolbar
customButtons, an export API name) - requiring absent terms would false-negative the docs cells.Validated before opening: config loads, each scenario's docs cell scores YES on real output, and two sampled paid passes (both interfaces, every toolset) score ~66 with tool-use provenance verified on every web/MCP cell and no false-fires.
One product finding worth a separate task: on web/MCP discovery (no injected docs), agents reliably reach the
superdoc/ui/reactimport path but often not theSuperDocUIProvidercomponent, and at least one cell recommended the Document API for a React toolbar - conflating the mutation surface with the UI surface. That points at a docs/MCP discoverability gap, not a config issue.No CI wiring or secrets in this PR; runs are manual for now.