🧪 Experiment Campaign: daily-file-diet
Workflow file: .github/workflows/daily-file-diet.md
Selected dimension: prompt_style
Triggered by: ab-testing-advisor on 2026-05-30
Background
daily-file-diet monitors the Go codebase daily, identifies the largest non-test source file, and — when it exceeds 800 lines — opens a GitHub issue with Serena MCP semantic analysis, proposed file splits, test coverage plans, and acceptance criteria. The prompt is currently dense: it encodes a full multi-step analysis protocol, a parameterized issue template with progressive-disclosure rules, and explicit MCP usage instructions. This density may be driving unnecessary token consumption and latency without proportional quality gains; a prompt_style experiment will determine whether a leaner prompt achieves equivalent output quality.
Hypothesis
- H0 (null): Changing the prompt style does not meaningfully change the quality or completeness of the generated refactoring issue.
- H1 (alternative): A
concise prompt style produces issues of equivalent or higher measurable quality while consuming fewer tokens and reducing run duration by ≥15%.
Experiment Configuration
Add the following experiments: block to the workflow frontmatter:
experiments:
prompt_style:
variants: [detailed, concise]
description: "Tests whether a leaner prompt preserves refactoring-issue quality vs. the current verbose multi-step protocol."
hypothesis: "H0: no change in issue completeness score. H1: concise variant reduces token usage by ≥15% with no significant drop in issue quality (split suggestions present, acceptance criteria present, Serena analysis present)."
metric: issue_completeness_score
secondary_metrics: [effective_token_count, run_duration_ms]
guardrail_metrics:
- name: issue_creation_success_rate
direction: min
threshold: 0.90
- name: empty_output_rate
direction: max
threshold: 0.05
min_samples: 50
weight: [50, 50]
start_date: "2026-05-30"
issue: #aw_filedieta
Variant descriptions:
detailed (baseline): Current prompt — full multi-step analysis protocol with explicit Serena MCP usage instructions, parameterized issue template, progressive-disclosure formatting rules.
concise: Compressed prompt retaining the essential intent (find largest file, if ≥800 lines create issue with semantic analysis and split suggestions) while removing template scaffolding and relying on the model's own formatting judgment.
Workflow Changes Required
In .github/workflows/daily-file-diet.md, wrap the prompt body in a conditional block keyed on the prompt_style experiment:
View diff
-You are a Go code quality expert...
-[full multi-step protocol with template]
+{{#if experiments.prompt_style == "concise" }}
+You are a Go code quality expert. Each weekday:
+1. Find the largest non-test `.go` file by line count (use `find` + `wc -l`).
+2. If it is ≥ 800 lines, use Serena MCP semantic analysis to identify function relationships, complexity hotspots, and module boundary candidates.
+3. Create a GitHub issue titled `[file-diet] <filename> (N lines)` with: a summary of findings, 2–4 concrete split proposals with rationale, a test coverage plan, and an acceptance checklist. Use `<details>` blocks for verbose sections.
+4. If all files are < 800 lines, output a brief ✅ healthy message and call `noop`.
+{{else}}
+[existing detailed prompt verbatim]
+{{/if}}
After editing, run:
gh aw compile daily-file-diet
Success Metrics
| Metric |
Type |
Target |
| Issue completeness score (split suggestions ✓ + acceptance criteria ✓ + Serena analysis ✓) |
Primary |
≥ 0.85 for both variants |
| Effective token count |
Secondary |
≥ 15% reduction in concise |
| Run duration (ms) |
Secondary |
Signal only |
| Issue creation success rate |
Guardrail |
Must not drop below 90% |
| Empty output rate |
Guardrail |
Must remain < 5% |
Statistical Design
- Variants:
detailed (baseline), concise
- Assignment: Round-robin via
gh-aw experiments runtime (cache-based)
- Minimum runs per variant: 50 (to detect a 15-percentage-point difference in completeness score at 80% power, two-proportion z-test)
- Expected experiment duration: ~100 weekday runs ≈ 20 weeks (workflow runs Mon–Fri; ~5 runs/week)
- Analysis approach: Two-proportion z-test for completeness score; two-sample t-test / Mann-Whitney U for token count and duration
Implementation Steps
References
Generated by 🧪 Daily A/B Testing Advisor · sonnet46 1.7M · ◷
🧪 Experiment Campaign: daily-file-diet
Workflow file:
.github/workflows/daily-file-diet.mdSelected dimension:
prompt_styleTriggered by:
ab-testing-advisoron 2026-05-30Background
daily-file-dietmonitors the Go codebase daily, identifies the largest non-test source file, and — when it exceeds 800 lines — opens a GitHub issue with Serena MCP semantic analysis, proposed file splits, test coverage plans, and acceptance criteria. The prompt is currently dense: it encodes a full multi-step analysis protocol, a parameterized issue template with progressive-disclosure rules, and explicit MCP usage instructions. This density may be driving unnecessary token consumption and latency without proportional quality gains; aprompt_styleexperiment will determine whether a leaner prompt achieves equivalent output quality.Hypothesis
conciseprompt style produces issues of equivalent or higher measurable quality while consuming fewer tokens and reducing run duration by ≥15%.Experiment Configuration
Add the following
experiments:block to the workflow frontmatter:Variant descriptions:
detailed(baseline): Current prompt — full multi-step analysis protocol with explicit Serena MCP usage instructions, parameterized issue template, progressive-disclosure formatting rules.concise: Compressed prompt retaining the essential intent (find largest file, if ≥800 lines create issue with semantic analysis and split suggestions) while removing template scaffolding and relying on the model's own formatting judgment.Workflow Changes Required
In
.github/workflows/daily-file-diet.md, wrap the prompt body in a conditional block keyed on theprompt_styleexperiment:View diff
After editing, run:
Success Metrics
conciseStatistical Design
detailed(baseline),concisegh-awexperiments runtime (cache-based)Implementation Steps
experiments:section to frontmatter{{#if experiments.prompt_style == "concise" }}(value-comparison form — never use the internal__GH_AW_EXPERIMENTS__env-var syntax)gh aw compile daily-file-dietto regenerate lock file/tmp/gh-aw/agent/experiments/state.jsonReferences
.github/workflows/daily-file-diet.md