feat: detect + send tool/model on submit (v0.6.0) by RayirthDinesh · Pull Request #3 · AISmithLab/aicodinggym-cli

RayirthDinesh · 2026-05-15T05:17:05Z

Summary

Auto-detect AI coding tool + model on every swe submit, cr submit, mle submit. Forward to backend so tool/model leaderboards (companion PR on aicodingymsite) can rank.
cli_env.py reads only an env allowlist — never the full environment — so secrets cannot leak into payloads.
Bump version 0.5.1 -> 0.6.0.

Detection sources

Tool: CLAUDECODE, CURSOR_TRACE_ID, TERM_PROGRAM=cursor, ANTIGRAVITY, AIDER_MODEL, CODEX_CLI, WINDSURF, CONTINUE_CLI, CLINE_CLI, GEMINI_CLI.
Model: ANTHROPIC_MODEL, CLAUDE_CODE_MODEL, OPENAI_MODEL, AIDER_MODEL, GEMINI_MODEL, CURSOR_MODEL.
MLE-specific: prefers solution_log.json (set by agent per CLAUDE.md) over env.
CLI flags --tool / --tool-version / --ai-model override auto-detection.

New API methods

submit_notification(...), cr_submit_review(...), mlebench_submit_csv(...) — extended with tool / tool_version / ai_model kwargs.
notify_mle_progress(...) — posts percentile + attribution to /api/users/{id}/progress so MLE rows feed the leaderboard aggregator.

Test plan

pip install -e . from branch
CLAUDECODE=1 ANTHROPIC_MODEL=claude-opus-4-7 aicodinggym swe submit <id> — output shows Tool: claude-code · model=claude-opus-4-7
aicodinggym swe submit <id> --tool cursor --ai-model gpt-5 — flags override env
aicodinggym cr submit <id> -f review.md — same Tool line in output
aicodinggym mle submit <id> -F predictions.csv — fires notify_mle_progress w/ percentile
Verify on aicodingymsite leaderboard /leaderboard Tools + Models tabs populate after backend processes

Notes

LeetCode flow intentionally untouched.
Companion backend / frontend changes ship in aicodingymsite PR #15.
Merge order: backend PR first (needs prisma db push for new tool / aiModel columns), then this CLI PR.

feat: add cr fetch/submit commands for code review challenges

- Replace shell=True subprocess calls with list-based args to avoid shell quoting issues across platforms - Add _restrict_key_permissions() using icacls on Windows, chmod on Unix - Restrict config directory permissions on Windows via icacls - Pre-check ssh-keygen availability with Windows-specific error message - Quote SSH key paths with forward slashes for Windows compatibility - Use /dev/null (not os.devnull) in GIT_SSH_COMMAND for MSYS2 ssh compat - Use platform-appropriate file viewer hint (type on Windows, cat on Unix) - Bump version to 0.3.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

On Windows, System32's OpenSSH (ssh.exe) is typically found before Git for Windows' bundled MSYS2 ssh on PATH. The native OpenSSH can trigger GUI credential dialogs or deadlock when stdout is captured, causing swe fetch to hang indefinitely during git clone. - Add _find_git_ssh() to locate Git for Windows' bundled ssh.exe - Use explicit ssh path in GIT_SSH_COMMAND to bypass System32 OpenSSH - Add BatchMode=yes to prevent interactive prompts on all platforms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Broadens git add exclusions to ignore all dot-prefixed files/directories (.*) and all markdown files (*.md) so scaffold metadata like .devcontainer, .swebench, .gitignore, problem_description.md, and hints_text.md are never included in submissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Downloads AI agent config files (.claudeignore, .cursorrules, CLAUDE.md, AGENTS.md, etc.) from AICodingGym/gym-environment into problem directories during swe fetch, cr fetch, and mle download. Also installs to the workspace root during configure. Downloaded files are added to .gitignore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bump version to 0.5.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…, dotfile exclusion, gym-environment fetch)

- cli_env.py: allowlist-only tool/model detection (no full env dump); reads ANTHROPIC_MODEL, CLAUDE_CODE_MODEL, OPENAI_MODEL, AIDER_MODEL, GEMINI_MODEL, CURSOR_MODEL - api.py: extend submit_notification / cr_submit_review / mlebench_submit_csv with tool/tool_version/ai_model; add notify_mle_progress to forward percentile + attribution to Prisma UserProgress - cli.py: --tool / --tool-version / --ai-model flags on swe submit, cr submit, mle submit; MLE reads solution_log.json for accurate model record per CLAUDE.md - bump version 0.5.1 -> 0.6.0

qyli00 and others added 14 commits March 15, 2026 23:22

Merge pull request #1 from AISmithLab/code-review

d16ec5e

feat: add cr fetch/submit commands for code review challenges

update api endpoints and default to head branch for code review tasks

4d9947e

fix: gzip compress CSV uploads to avoid 413 errors on large submissions

c2ba5d6

Bump version to 0.5.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

logger code implemented for the cli

e2de91a

MLE Logger Beta complete

38f167f

improve trajectory analysis

e4b25b4

Add text normalization design spec

6c63a53

MLE logger changes completed

7acec66

merge: bring in latest main (cross-platform support, gzip CSV uploads…

a5b25b3

…, dotfile exclusion, gym-environment fetch)

RayirthDinesh changed the title ~~MLE Bench Logger~~ feat: detect + send tool/model on submit (v0.6.0) May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: detect + send tool/model on submit (v0.6.0)#3

feat: detect + send tool/model on submit (v0.6.0)#3
RayirthDinesh wants to merge 14 commits into
mainfrom
MLE-Bench_Logger

RayirthDinesh commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RayirthDinesh commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Detection sources

New API methods

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RayirthDinesh commented May 15, 2026 •

edited

Loading