Deterministic runtime for agent evaluation
-
Updated
Mar 25, 2026 - Python
Deterministic runtime for agent evaluation
WordleBench — Deterministic AI Wordle benchmark. Compare 34+ LLMs (GPT-5, Claude 4.5, Gemini, Grok, Llama) head-to-head on accuracy, speed, and cost across 50 standardized words.
The deterministic heap groomer for C/C++ memory debugging.
Type-safe clock abstractions for Go with zero dependencies
Chaos & adversarial testing framework for Arbitrum rollup stacks — deterministic simulation + live Docker fault injection. Same seed = same chaos = reproducible bugs.
Deterministic Rust testing utility for simulation and stochastic workflows
Turn-based political sim where policy decisions ripple through 14 competing factions. Manage legitimacy, navigate crises, and survive your rival across 10 possible endings. Built with React + TypeScript via Claude Code, with deterministic testing and accessibility features.
Deterministic GraphQL security auditor built on proven mathematics. 16 crates, 299 tests, 5.7MB. Shannon, Wald, Fisher, Lamport, Markov, Bayes, Hughes, Brandes, Tarjan — one binary.
Local-first C# project for deterministic prompt versioning, A/B evaluation, and evidence-based promotion using structured scoring.
Add a description, image, and links to the deterministic-testing topic page so that developers can more easily learn about it.
To associate your repository with the deterministic-testing topic, visit your repo's landing page and select "manage topics."