A seven-step experiment that clones Express.js, deletes its test suite, rebuilds tests from source code alone, reimplements the library from those tests alone, builds an app on the result, and finally rebuilds the library a second time from pure parametric memory — each step blind to the artifacts of the one it replaces.
Pristine clone of expressjs/express. Ran npm test to establish ground truth.
- 1249 tests, all passing
- 98.75% statement coverage, 96.16% branch, 100% functions, 99.61% lines
Cloned Express again, deleted test/, and wrote a new test suite from scratch. The only inputs were lib/ source code, README.md, and examples/. The original tests in 1-original-repo/test/ were never opened.
Built in 6 loops: infrastructure, then application, then four parallel subagent passes covering request (18 files), response (21 files), router/middleware (8 files), and acceptance tests (18 files modeled on examples/).
- 667 tests, all passing
- 95.12% stmts, 86.70% branch, 97.24% funcs, 96.36% lines
Cloned Express a third time, deleted both lib/ and test/, copied the derived tests from step 2, and reimplemented all 6 lib/ files using only the tests as specification. Never looked at 1-original-repo/lib/ or 2-clean-room-rebuild-tests/lib/.
Implemented in dependency order: utils.js → view.js → request.js → response.js → application.js → express.js.
- 661/667 passing on first run (99.1%)
- Fixed 4 bugs (view lookup crash, format duplicate branch, jsonp array callback, 204 Content-Length leak)
- 667/667 tests passing, 94.63% stmts, 86.75% branch, 96.58% funcs, 96.13% lines
Structural comparison against the original source showed the reimplementation converged on nearly identical architecture — same factory pattern, same lazy router, same middleware init, same settings system — with minor divergences in implementation details (regex vs stdlib, computed property vs middleware for query parsing, real Symbol vs string-based fake symbol).
Added 29 targeted tests for uncovered lines/branches identified via istanbul reports.
| Metric | Before | After |
|---|---|---|
| Stmts | 94.63% | 97.68% |
| Branch | 86.75% | 93.15% |
| Funcs | 96.58% | 98.29% |
| Lines | 96.13% | 98.97% |
Techniques: socket manipulation, header/property deletion, raw prototype access, env override, direct View constructor calls, mount-based trust proxy inheritance.
Built "Clipstash" — a code snippet manager with browser UI and JSON API — on top of the reimplemented Express to verify it works as a real library, not just under unit tests.
Exercises ~30 Express features: express(), app.set/listen/engine/locals, express.json/urlencoded/static, express.Router(), router.param(), req.query/body/params/get/is/ip/xhr/cookies/path/method, res.json/send/render/redirect/status/sendStatus/sendFile/cookie/clearCookie/set/type/format/links/vary/append/locals, EJS templates with partials, content negotiation on 4 endpoints, and 4-arg error handling.
cd 5-demo-app && node app.js
# http://localhost:3000
No npm install required — it requires Express directly from ../4-improve-coverage/.
A different question: what if there are no derived tests, no source, no web — just the model's training-data memory? Cloned the project from step 4 (which keeps the test suite), deleted lib/, and reimplemented all 6 files from pure recall. Allowed to iterate using npm test failures, but never to read source, README, examples, or web.
- 696/696 tests passing after iteration
- ~2683 lines vs ~2762 in the original — within 3%
Same setup as step 6, but stricter: write all 6 lib/ files in a single pass with no testing between them, then run npm test exactly once and report the score. No fixing.
- 663/696 tests passing on first attempt (95.3%)
- ~1658 lines — about 60% of the original
- 33 failures concentrated in:
res.download/res.attachmentfilename handling (~10),res.statusvalidation (6), missingutils.methodsexport (3), cascading acceptance test 500s (5), and small gaps inres.send,app.path(),app.routerdeprecation, query parser default, andreq.hostX-Forwarded-Host parsing
The architecture comes back faithful — lazy router init, settings prototype chain, etag/wetag helpers, createApplication factory — but edge-case validation and deprecation paths get silently dropped without test feedback to surface them.
| Step | What | Tests | Stmt Coverage |
|---|---|---|---|
| 1 | Original Express | 1249 | 98.75% |
| 2 | Derived test suite | 667 | 95.12% |
| 3 | Reimplemented lib/ | 667 | 94.63% |
| 4 | + coverage tests | 696 | 97.68% |
| 5 | Demo app | — | works in browser + curl |
| 6 | Parametric rebuild (iterating) | 696/696 | — |
| 7 | Parametric rebuild (one-shot) | 663/696 | — |
Tests-as-spec works. A test suite derived from source code was sufficient to reimplement Express from scratch with 99.1% first-run pass rate. The reimplementation converged on the same architecture without ever seeing the original code.
What tests capture: Public API contracts, routing behavior, middleware composition, content negotiation, error propagation, header semantics, cookie handling, template rendering, trust proxy logic.
What tests miss: Deprecation warnings, error message wording, internal implementation choices (regex vs net.isIP()), race condition handling (onFinished in sendfile), dead code (View.prototype.resolve()).
Tests can drive better design. The reimplementation's query as a computed getter (vs middleware that parses once) and settings prototype chain walk (vs direct lookup) are arguably improvements — emergent from what the tests require rather than how the original happened to implement it.
Parametric memory gets you 95% of the way; iteration closes the rest. Steps 6 and 7 ask whether the model can rebuild Express from training data alone. One-shot from memory passes 663/696 (95.3%) — the architecture is faithful but ~5% of behavior (validation strictness, deprecation paths, a few edge cases) gets silently dropped. With test-failure iteration the same blind setup converges on 696/696. The signal isn't memory or tests in isolation; it's a feedback loop that grounds memory.
full-ralph/
├── 1-original-repo/ # Pristine Express clone (baseline)
├── 2-clean-room-rebuild-tests/ # Express + derived test suite
├── 3-reimplement-from-derived-specs/ # Tests + reimplemented lib/
├── 4-improve-coverage/ # Above + 29 targeted coverage tests
├── 5-demo-app/ # Clipstash app on the reimplementation
│ ├── app.js
│ ├── routes/
│ │ ├── snippets.js # HTML CRUD + router.param()
│ │ └── api.js # JSON API + content negotiation
│ ├── views/ # EJS templates
│ └── public/ # Static assets
├── 6-parametric-memory/ # lib/ rebuilt from model memory (with iteration)
├── 7-blind-from-memory/ # lib/ rebuilt from model memory (one-shot, no iteration)
├── blind-rebuild.md # Prompt for step 6
├── blind-from-memory.md # Prompt for step 7
├── CLAUDE.md # Detailed step-by-step log
└── README.md # This file
# Step 1: Original baseline
cd 1-original-repo && npm test
# Step 2: Derived tests against original lib
cd 2-clean-room-rebuild-tests && npm test
# Step 3: Derived tests against reimplemented lib
cd 3-reimplement-from-derived-specs && npm test
# Step 4: Extended tests against reimplemented lib
cd 4-improve-coverage && npm test
# Step 4 with coverage
cd 4-improve-coverage && npm run test-cov
# Step 5: Demo app
cd 5-demo-app && node app.js
# Then: curl http://localhost:3000/api/snippets
# or: open http://localhost:3000 in a browser
# Step 6: Parametric memory rebuild (iterated to 696/696)
cd 6-parametric-memory && npm test
# Step 7: Parametric memory rebuild (one-shot, 663/696)
cd 7-blind-from-memory && npm test- Test framework: Mocha + supertest + assert
- Coverage: Istanbul (nyc)
- Template engine: EJS
- Built with: Claude Code (claude-opus-4-6) using ralph-loop for multi-step orchestration and parallel subagents for test generation