Skip to content

ryan-allen/tests-from-code-then-code-from-tests

Repository files navigation

Express.js Clean-Room Reconstruction

A seven-step experiment that clones Express.js, deletes its test suite, rebuilds tests from source code alone, reimplements the library from those tests alone, builds an app on the result, and finally rebuilds the library a second time from pure parametric memory — each step blind to the artifacts of the one it replaces.

The Experiment

Step 1 — Baseline (1-original-repo/)

Pristine clone of expressjs/express. Ran npm test to establish ground truth.

  • 1249 tests, all passing
  • 98.75% statement coverage, 96.16% branch, 100% functions, 99.61% lines

Step 2 — Derive Tests from Source (2-clean-room-rebuild-tests/)

Cloned Express again, deleted test/, and wrote a new test suite from scratch. The only inputs were lib/ source code, README.md, and examples/. The original tests in 1-original-repo/test/ were never opened.

Built in 6 loops: infrastructure, then application, then four parallel subagent passes covering request (18 files), response (21 files), router/middleware (8 files), and acceptance tests (18 files modeled on examples/).

  • 667 tests, all passing
  • 95.12% stmts, 86.70% branch, 97.24% funcs, 96.36% lines

Step 3 — Reimplement lib/ from Tests (3-reimplement-from-derived-specs/)

Cloned Express a third time, deleted both lib/ and test/, copied the derived tests from step 2, and reimplemented all 6 lib/ files using only the tests as specification. Never looked at 1-original-repo/lib/ or 2-clean-room-rebuild-tests/lib/.

Implemented in dependency order: utils.jsview.jsrequest.jsresponse.jsapplication.jsexpress.js.

  • 661/667 passing on first run (99.1%)
  • Fixed 4 bugs (view lookup crash, format duplicate branch, jsonp array callback, 204 Content-Length leak)
  • 667/667 tests passing, 94.63% stmts, 86.75% branch, 96.58% funcs, 96.13% lines

Structural comparison against the original source showed the reimplementation converged on nearly identical architecture — same factory pattern, same lazy router, same middleware init, same settings system — with minor divergences in implementation details (regex vs stdlib, computed property vs middleware for query parsing, real Symbol vs string-based fake symbol).

Step 4 — Improve Coverage (4-improve-coverage/)

Added 29 targeted tests for uncovered lines/branches identified via istanbul reports.

Metric Before After
Stmts 94.63% 97.68%
Branch 86.75% 93.15%
Funcs 96.58% 98.29%
Lines 96.13% 98.97%

Techniques: socket manipulation, header/property deletion, raw prototype access, env override, direct View constructor calls, mount-based trust proxy inheritance.

Step 5 — Demo App (5-demo-app/)

Built "Clipstash" — a code snippet manager with browser UI and JSON API — on top of the reimplemented Express to verify it works as a real library, not just under unit tests.

Exercises ~30 Express features: express(), app.set/listen/engine/locals, express.json/urlencoded/static, express.Router(), router.param(), req.query/body/params/get/is/ip/xhr/cookies/path/method, res.json/send/render/redirect/status/sendStatus/sendFile/cookie/clearCookie/set/type/format/links/vary/append/locals, EJS templates with partials, content negotiation on 4 endpoints, and 4-arg error handling.

cd 5-demo-app && node app.js
# http://localhost:3000

No npm install required — it requires Express directly from ../4-improve-coverage/.

Step 6 — Parametric Memory Rebuild, with Iteration (6-parametric-memory/)

A different question: what if there are no derived tests, no source, no web — just the model's training-data memory? Cloned the project from step 4 (which keeps the test suite), deleted lib/, and reimplemented all 6 files from pure recall. Allowed to iterate using npm test failures, but never to read source, README, examples, or web.

  • 696/696 tests passing after iteration
  • ~2683 lines vs ~2762 in the original — within 3%

Step 7 — Parametric Memory Rebuild, One-Shot (7-blind-from-memory/)

Same setup as step 6, but stricter: write all 6 lib/ files in a single pass with no testing between them, then run npm test exactly once and report the score. No fixing.

  • 663/696 tests passing on first attempt (95.3%)
  • ~1658 lines — about 60% of the original
  • 33 failures concentrated in: res.download/res.attachment filename handling (~10), res.status validation (6), missing utils.methods export (3), cascading acceptance test 500s (5), and small gaps in res.send, app.path(), app.router deprecation, query parser default, and req.host X-Forwarded-Host parsing

The architecture comes back faithful — lazy router init, settings prototype chain, etag/wetag helpers, createApplication factory — but edge-case validation and deprecation paths get silently dropped without test feedback to surface them.

Results Summary

Step What Tests Stmt Coverage
1 Original Express 1249 98.75%
2 Derived test suite 667 95.12%
3 Reimplemented lib/ 667 94.63%
4 + coverage tests 696 97.68%
5 Demo app works in browser + curl
6 Parametric rebuild (iterating) 696/696
7 Parametric rebuild (one-shot) 663/696

Key Findings

Tests-as-spec works. A test suite derived from source code was sufficient to reimplement Express from scratch with 99.1% first-run pass rate. The reimplementation converged on the same architecture without ever seeing the original code.

What tests capture: Public API contracts, routing behavior, middleware composition, content negotiation, error propagation, header semantics, cookie handling, template rendering, trust proxy logic.

What tests miss: Deprecation warnings, error message wording, internal implementation choices (regex vs net.isIP()), race condition handling (onFinished in sendfile), dead code (View.prototype.resolve()).

Tests can drive better design. The reimplementation's query as a computed getter (vs middleware that parses once) and settings prototype chain walk (vs direct lookup) are arguably improvements — emergent from what the tests require rather than how the original happened to implement it.

Parametric memory gets you 95% of the way; iteration closes the rest. Steps 6 and 7 ask whether the model can rebuild Express from training data alone. One-shot from memory passes 663/696 (95.3%) — the architecture is faithful but ~5% of behavior (validation strictness, deprecation paths, a few edge cases) gets silently dropped. With test-failure iteration the same blind setup converges on 696/696. The signal isn't memory or tests in isolation; it's a feedback loop that grounds memory.

Directory Layout

full-ralph/
├── 1-original-repo/          # Pristine Express clone (baseline)
├── 2-clean-room-rebuild-tests/  # Express + derived test suite
├── 3-reimplement-from-derived-specs/  # Tests + reimplemented lib/
├── 4-improve-coverage/        # Above + 29 targeted coverage tests
├── 5-demo-app/                # Clipstash app on the reimplementation
│   ├── app.js
│   ├── routes/
│   │   ├── snippets.js        # HTML CRUD + router.param()
│   │   └── api.js             # JSON API + content negotiation
│   ├── views/                 # EJS templates
│   └── public/                # Static assets
├── 6-parametric-memory/       # lib/ rebuilt from model memory (with iteration)
├── 7-blind-from-memory/       # lib/ rebuilt from model memory (one-shot, no iteration)
├── blind-rebuild.md           # Prompt for step 6
├── blind-from-memory.md       # Prompt for step 7
├── CLAUDE.md                  # Detailed step-by-step log
└── README.md                  # This file

Running

# Step 1: Original baseline
cd 1-original-repo && npm test

# Step 2: Derived tests against original lib
cd 2-clean-room-rebuild-tests && npm test

# Step 3: Derived tests against reimplemented lib
cd 3-reimplement-from-derived-specs && npm test

# Step 4: Extended tests against reimplemented lib
cd 4-improve-coverage && npm test

# Step 4 with coverage
cd 4-improve-coverage && npm run test-cov

# Step 5: Demo app
cd 5-demo-app && node app.js
# Then: curl http://localhost:3000/api/snippets
#   or: open http://localhost:3000 in a browser

# Step 6: Parametric memory rebuild (iterated to 696/696)
cd 6-parametric-memory && npm test

# Step 7: Parametric memory rebuild (one-shot, 663/696)
cd 7-blind-from-memory && npm test

Tooling

  • Test framework: Mocha + supertest + assert
  • Coverage: Istanbul (nyc)
  • Template engine: EJS
  • Built with: Claude Code (claude-opus-4-6) using ralph-loop for multi-step orchestration and parallel subagents for test generation

About

Can Opus 4.6 build a test suite then a library from a test suite with equivalent coverage?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages