Releases · CodeAlive-AI/ai-driven-development

19 Jun 20:03

rodion-m

v3.6.2

2b1ff80

v3.6.2 Latest

Latest

Patch release correcting Codex hook semantics.

Makes Codex bash-guard prompt deferral opt-in instead of default.
Keeps fail-closed deny as the default Codex live behavior.
Updates hooks-management guidance for Codex App/CLI after upstream research.

Related bash-guard release: bash-guard-v0.3.2

Assets 2

19 Jun 19:56

rodion-m

v3.6.1

b951f23

v3.6.1 — Codex prompt rules bridge

Patch release for Codex prompt semantics in bash-guard.

Keeps Codex bash-guard live by default instead of putting the whole hook into shadow mode.
Adds explicit defer-to-execpolicy support for selected bash-guard reason codes.
Installers pair supabase.db_push defer with a managed Codex prefix_rule(... decision="prompt").
Documents the safe pattern for adapting Claude Code ask hooks to Codex.

Verified with Go tests, shell syntax checks, isolated installer tests, direct hook smoke tests, and local codex exec behavior.

Assets 2

19 Jun 19:13

rodion-m

v3.6.0

48f18c7

v3.6.0 — Codex App hooks support

Updates hooks-management and balanced-safety-hooks for current Codex CLI / Codex App hook behavior.

Highlights

Actualized hooks-management for June 2026 Codex docs: hooks enabled by default, canonical [features].hooks, 10 lifecycle events, shared CLI/App config layers, /hooks review and trust flow.
Added Codex App / Codex CLI install path for bash-guard via ~/.codex/hooks.json.
Documented the important Codex limitation: PreToolUse does not support permissionDecision: "ask"; Codex live mode hard-blocks risky Bash commands via deny.
Preserved Claude Code behavior: risky Bash commands still emit permissionDecision: "ask".
Expanded git safety fixtures for branch creation and stash-sweeping commands.

Verification

go test -count=1 ./... in hooks/balanced-safety-hooks/src
bash -n for both installers
git diff --check
Isolated installer install/uninstall smoke test for --both

See also the binary release: bash-guard-v0.3.0.

Assets 2

19 Jun 20:03

rodion-m

bash-guard-v0.3.2

2b1ff80

bash-guard v0.3.2

Patch release for Codex CLI/App safety semantics.

Codex live mode now defaults to fail-closed deny for bash-guard ask decisions.
Added --codex-native-prompts as explicit opt-in for best-effort native Codex execpolicy prompts.
Documented the Codex PreToolUse ask limitation and open runtime risk with prompt rules under full access.

Verification:

go test -count=1 ./...
bash -n install.sh && bash -n install-prebuilt.sh && git diff --check
installer smoke test with and without --codex-native-prompts
local codex exec fake supabase db push blocked by PreToolUse hook

Assets 7

19 Jun 19:56

rodion-m

bash-guard-v0.3.1

b951f23

bash-guard v0.3.1 — Codex prompt rules bridge

Patch release for Codex App / CLI semantics.

What changed

Keeps Codex bash-guard live by default: risky commands still hard-block through PreToolUse.
Adds BASH_GUARD_CODEX_DEFER_REASON_CODES for a small explicit set of bash-guard reason codes that should be handled by Codex execpolicy prompts.
Installers configure supabase.db_push as the first deferred reason code and add a paired prefix_rule(... decision="prompt") to ~/.codex/rules/default.rules.
This preserves Claude Code-style semantics: everything is allowed except hook-described risky actions; where Codex can show a native prompt, the hook defers to that prompt, otherwise it blocks.

Verified

go test -count=1 ./...
shell syntax checks for both installers
isolated Codex install/uninstall smoke test
local codex exec with a fake Supabase binary: supabase db push reaches execpolicy prompt; rm -rf /etc/... is blocked by PreToolUse hook.

After installing/updating in Codex App, restart the app and review/trust the modified hook in /hooks.

Assets 7

19 Jun 19:13

rodion-m

bash-guard-v0.3.0

48f18c7

bash-guard v0.3.0 — Codex App support

Adds Codex CLI / Codex App support to bash-guard while preserving Claude Code behavior.

What's new

New BASH_GUARD_ADAPTER=codex wire adapter. Internal ask decisions map to Codex permissionDecision: "deny" because Codex PreToolUse does not support hook-created ask prompts yet.
Source and prebuilt installers now support --codex and --both, writing ~/.codex/hooks.json with PreToolUse[matcher=^Bash$].
Codex allow path emits empty stdout, matching Codex hook semantics.
Project config discovery now accepts .codex/bash-guard.toml alongside .claude/bash-guard.toml, still gated by trusted-projects.
Audit log records adapter and emitted decision.
Git guard coverage now asks before git stash/stash push/stash save and new branch creation via checkout -b/-B or switch -c/-C.

Verification

go test -count=1 ./...
bash -n install.sh install-prebuilt.sh
Isolated installer install/uninstall test for --both
Manual Codex wire smoke test: risky Bash emits deny; safe Bash emits no stdout.

Quick install

# Claude Code
curl -fsSL https://raw.githubusercontent.com/CodeAlive-AI/ai-driven-development/main/hooks/balanced-safety-hooks/install-prebuilt.sh | sh

# Codex CLI / Codex App
curl -fsSL https://raw.githubusercontent.com/CodeAlive-AI/ai-driven-development/main/hooks/balanced-safety-hooks/install-prebuilt.sh | sh -s -- --codex

After installing for Codex, restart Codex and open /hooks to review/trust the new hook if prompted.

Assets 7

02 Jun 17:35

rodion-m

v3.5.0

0a2427f

v3.5.0 — Rich grading, multi-run variance, blind compare, HTML viewer

Summary

Five additions to the SkillOpt loop, adapted from upstream anthropics/skills' skill-creator eval infrastructure but reframed for management/optimisation (not creation). Addresses a gap identified in the v3.4 retrospective: the validation gate uses the same verifier that proposed edits, which can be self-confirming.

New in optimize_skill.py

Flag	What it does
`--runs-per-task N`	Each task executed N times. `rollouts.jsonl` gains `score_mean`/`score_stddev`/`score_min`/`score_max`/`runs[]`. Validation gate uses the mean. Use when the verifier or agent is noisy.
`--verifier assertions`	New grading mode. Tasks gain `assertions[]` (declarative checks). Grader returns rich `grading.json` with per-assertion pass/fail + evidence, extracted claims, and `eval_feedback` — a critique of the assertions themselves that flags weak / non-discriminating checks.

`optimization_report.md` gains an Assertion critique section aggregating `eval_feedback.suggestions` across the run, deduped + ranked by frequency. Operator gets back actionable improvements to the eval set.

New scripts

Script	Purpose
`scripts/blind_comparator.py`	Independent A/B judge between two skills on the same tasks. Randomised X/Y labels per task with a seed. Aggregated to `comparison_report.{json,md}`. Catches self-confirming gate behaviour.
`scripts/eval_viewer.py`	Single-page static HTML renderer for an output-dir. Per-epoch SVG chart, accepted/rejected edit timelines, slow-update history, per-task rollouts with grading checklists, initial→best diff. `--compare` mode for two runs side-by-side. No JS / CSS dependencies.

New prompt contracts (in `prompts/`)

`grader.md` — assertions verifier grader
`blind_comparator.md` — blind A/B judge

Reference updates

`optimization-artifacts-schemas.md` (+193 lines): `tasks.jsonl` with `assertions[]`/`files[]`, `rollouts.jsonl` with multi-run + grading fields, `decision.json` with stddev, new blind comparator artefacts, eval viewer artefact. All v3.5 changes are additive; schema stability is a v3.x guarantee.
`optimization-grading-checklist.md` (+63 lines): pre-flight stddev check, per-task variance review, assertion critique review, blind comparison verdict review. New red flag: "blind comparator says a_wins ≥ b_wins despite SkillOpt accepting edits — investigate before shipping."

Compatibility

100% backward compatible. Default `--runs-per-task=1` + `--verifier llm-judge` produces byte-equivalent output to v3.4.
Python 3.10+, stdlib only. No new dependencies.

Test plan

`python3 scripts/optimize_skill.py --help` shows `--runs-per-task` and `assertions` choice
Run a small task set with `--verifier assertions --runs-per-task 3`, confirm `grading` field in `rollouts.jsonl` and `score_stddev` in `decision.json`
Confirm `optimization_report.md` contains an "Assertion critique" section
Run `blind_comparator.py --skill-a initial_skill.md --skill-b best_skill.md --tasks tasks.jsonl --dry-run`
Run `eval_viewer.py runs/r1` and open the produced HTML in a browser
Run `eval_viewer.py runs/r1 runs/r2 --compare` and confirm side-by-side layout

🤖 Generated with Claude Code

Assets 2

25 May 21:05

rodion-m

v3.4.0

a43dc0e

v3.4.0 — Artefact schemas, run aggregator, grading checklist

Summary

Three additions to skills-management, adopted and adapted from upstream anthropics/skills' skill-creator (commit `690f15c`, May 2026). The upstream is creator-focused; these adaptations target the management / optimisation context.

What's new

File	Purpose
`references/optimization-artifacts-schemas.md`	Formal JSON schemas for every artefact written by `optimize_skill.py` and `log_skill_edit.py`: `splits.json`, `state.json`, `rollouts.jsonl`, `proposals.json`, `decision.json`, `edit_apply_report.json`, `rejected_buffer.json`, `meta_skill.json`, `optimization_report.md` frontmatter, `test_rollouts.jsonl`, `.skill_edit_log.jsonl`, `.skill_snapshots/`. Schema stability is a v3.x guarantee — breaking changes bump major.
`scripts/aggregate_runs.py`	Aggregate N optimisation runs into a single summary. Computes mean/stddev/min/max for `test_score`, `tokens_delta`, `test_delta_vs_baseline`, plus per-run lists for accepted/rejected edits. `--compare` for side-by-side; text / json / md output; robust to incomplete runs.
`references/optimization-grading-checklist.md`	Audit checklist applied to a finished optimisation run before shipping `best_skill.md`. Pre-flight + per-artefact review (best_skill, edit_apply_report, rejected_buffer, optimization_report+test_rollouts, meta_skill) + red flags + green-light decision paths.

What we deliberately skipped from upstream

These are creator-only machinery for description prose iteration, not a fit for managing/auditing existing skills:

`agents/grader.md`, `agents/analyzer.md`, `agents/comparator.md` — eval-loop orchestration
`scripts/run_loop.py`, `scripts/improve_description.py`, `scripts/run_eval.py` — description-iteration loop
`eval-viewer/` — creator-side visualisation

SKILL.md changes

Quick Reference adds the aggregator
References table adds the two new docs

Compatibility

Backwards compatible. All v3.3.0 scripts and schemas unchanged.
Python 3.10+, stdlib only. No new dependencies.

Test plan

`python3 scripts/aggregate_runs.py --help` exits 0
Run `optimize_skill.py` once, then `aggregate_runs.py ` and confirm summary
Run `optimize_skill.py` twice with different seeds, then `aggregate_runs.py r1 r2 --compare`
Read `references/optimization-grading-checklist.md` end-to-end against a real run

🤖 Generated with Claude Code

Assets 2

25 May 20:37

rodion-m

v3.3.0

dab3270

v3.3.0 — SkillOpt training loop for skills-management

Summary

skills-management becomes a trainable skill manager, not just a CRUD tool. Based on SkillOpt (Microsoft, May 2026): treat the SKILL.md document as the external trainable state of a frozen agent, with the same discipline that makes weight-space optimisation reproducible — bounded edits, held-out validation gate, rejected-edit buffer, epoch-wise slow/meta update.

New scripts

Script	Purpose
`scripts/optimize_skill.py`	Full SkillOpt loop: train/sel/test splits, rollout via `claude -p`, failure/success mini-batch reflection, hierarchical merge, ranked bounded apply (`constant`/`linear`/`cosine` L_t schedules), strict-greater validation gate, rejected-edit buffer, epoch-boundary slow update into a protected section, optimiser-side meta-skill. Supports `--dry-run` and `--resume`.
`scripts/log_skill_edit.py`	Append-only audit log with SHA chain, token delta, optional `--snapshot`.
`scripts/diff_skill_versions.py`	Diff between git commits, explicit files, or snapshot history; `unified`/`stats`/`side-by-side` formats.
`scripts/trigger_test.py`	Trigger tests with heuristic or `claude-cli` judge, P/R/F1 metrics. `--generate` seeds `cases.yaml` from the description.
`scripts/transfer_test.py`	Structural verification of a skill across all 42 supported agents; `--copy --to <agent>`, `--all`.

New prompt contracts (verbatim §C.2 of the paper)

prompts/analyst_error.md, analyst_success.md, merge_failure.md, merge_success.md, merge_final.md, ranking.md, slow_update.md, meta_skill.md.

New reference

references/skill-optimization.md — when to optimise, five core principles, targets (300-2000 tokens, 1-4 accepted edits), one-page algorithm, hyperparameter defaults, six anti-patterns, transfer evidence.

review_skill.py upgrades

Token footprint (300-2000 target per Table 6 of the paper; penalties at 2000/4000)
Procedurality check (instance-specific markers — filenames, literal numbers, task references — should be rare)
Patch-friendliness (anchor density + duplicate-anchor detection for reliable insert_after edits)
Slow-update section integrity (balanced  /  markers)

JSON output gains body_tokens, references_tokens, slow_update_tokens, total_tokens, anchor_density, heading_count. CLI flags and exit codes unchanged.

SKILL.md changes

New section: Optimize a Skill (SkillOpt-style)
Quick Reference gains 8 new commands (optimize, log, diff, trigger-test, transfer-test, generate cases)
Documented  protected-section convention
Description extended with new trigger phrases: "optimise skill", "train skill on tasks", "iterate skill", "audit skill edits", "log skill edit", "diff skill versions", "trigger test skill", "transfer skill across agents"

Patterns reference

06-patterns-and-troubleshooting.md gains Pattern 6: Validated iterative refinement with the blind-rewrites anti-pattern.

Compatibility

Backwards compatible. All existing scripts unchanged. review_skill.py keeps every existing rule.
Python 3.10+, stdlib only. No new dependencies.
optimize_skill.py shells out to claude -p for rollouts and optimiser calls — inherits whatever subscription/API the user has configured.

Test plan

Run python3 scripts/review_skill.py <any-skill> and verify the new Token footprint block appears.
Run python3 scripts/optimize_skill.py <skill> --tasks tasks.jsonl --dry-run and verify schedule + splits + prompt previews print without LLM calls.
Run python3 scripts/log_skill_edit.py <skill> --reason "test" --dry-run and verify the planned entry.
Run python3 scripts/trigger_test.py <skill> --generate > cases.yaml then --cases cases.yaml.
Run python3 scripts/transfer_test.py <skill> --all to verify cross-agent placement.

🤖 Generated with Claude Code

Assets 2

11 May 17:51

rodion-m

refactoring-csharp-v0.1.0

cb49b71

refactoring-csharp v0.1.0

Roslyn-based C# rename refactorer packaged as an agent skill.\n\nAssets include release installers, the skill archive, self-contained CLI binaries for macOS/Linux/Windows, and SHA256 checksums.\n\nQuick install:\n\nmacOS/Linux:\nbash\ncurl -fsSL https://github.com/CodeAlive-AI/ai-driven-development/releases/download/refactoring-csharp-v0.1.0/install.sh | bash\n\n\nWindows PowerShell:\npowershell\nirm https://github.com/CodeAlive-AI/ai-driven-development/releases/download/refactoring-csharp-v0.1.0/install.ps1 | iex\n

Assets 13

Releases: CodeAlive-AI/ai-driven-development

v3.6.2

Uh oh!

v3.6.1 — Codex prompt rules bridge

Uh oh!

v3.6.0 — Codex App hooks support

Highlights

Verification

Uh oh!

bash-guard v0.3.2

Uh oh!

bash-guard v0.3.1 — Codex prompt rules bridge

What changed

Verified

Uh oh!

bash-guard v0.3.0 — Codex App support

What's new

Verification

Quick install

Uh oh!

v3.5.0 — Rich grading, multi-run variance, blind compare, HTML viewer

Summary

New in optimize_skill.py

New scripts

New prompt contracts (in `prompts/`)

Reference updates

Compatibility

Test plan

Uh oh!

v3.4.0 — Artefact schemas, run aggregator, grading checklist

Summary

What's new

What we deliberately skipped from upstream

SKILL.md changes

Compatibility

Test plan

Uh oh!

v3.3.0 — SkillOpt training loop for skills-management

Summary

New scripts

New prompt contracts (verbatim §C.2 of the paper)

New reference

review_skill.py upgrades

SKILL.md changes

Patterns reference

Compatibility

Test plan

Uh oh!

refactoring-csharp v0.1.0

Uh oh!