feat(serve): wire Ollama /api/chat + /api/generate so apr serve is a drop-in Ollama HTTP replacement (PMAT-923) by noahgift · Pull Request #2216 · paiml/aprender

noahgift · 2026-06-24T10:48:01Z

EV#6 — Pillar-4 / Ollama drop-in (HTTP layer)

Gap

The realizar serve router that apr serve mounts (realizar::api::create_router → create_router_with_config) exposed only the OpenAI /v1/* API. POSTing Ollama's native /api/chat or /api/generate hit the axum not_found fallback (404), so an Ollama HTTP client could not use apr serve as a drop-in replacement at the HTTP layer.

Fix

New crates/aprender-serve/src/api/ollama_handlers.rs adds ollama_chat_handler + ollama_generate_handler.
Wired POST /api/chat + POST /api/generate into create_router_with_config under the existing OpenAI-compat toggle (mirrors how /v1/chat/completions is wired).
Both handlers translate the Ollama request → ChatCompletionRequest and reuse the existing generation path (openai_chat_completions_handler, the same backend chain as /v1/chat/completions), then re-shape into Ollama's wire schema:
- /api/chat: {model, created_at, message:{role:"assistant", content}, done:true, prompt_eval_count, eval_count}
- /api/generate: flat {model, created_at, response, done:true, prompt_eval_count, eval_count} (no nested message)
A wired route is observably distinct from the axum 404 fallback (which has no done field) even with no model loaded — the handler always emits a terminal done:true Ollama-shaped body, surfacing any backend error as the assistant content.

Falsifier (RED → GREEN, mutation-verified)

crates/aprender-serve/tests/ollama_http_compat.rs (uses AppState::demo(), no model download):

api_chat_is_routed_and_returns_ollama_shape
api_generate_is_routed_and_returns_ollama_shape

RED on the unwired router (404, no message/done); GREEN once wired (200 + Ollama shape). Mutation-verified: removing the two .route() calls flips both falsifiers RED (404), restoring flips them GREEN.

Contract

Discharges OBLIG-OLLAMA-API-CHAT-GENERATE-ROUTED in contracts/apr-serve-openai-compat-v1.yaml (v1.12.0 → v1.13.0) with FALSIFY-OLLAMA-API-CHAT-ROUTED-923 + FALSIFY-OLLAMA-API-GENERATE-ROUTED-923 (single-line test refs). pv validate + pv lint contracts/ PASS (0 errors).

Verification

cargo test -p aprender-serve --lib → 15450 passed
cargo test -p aprender-serve --test ollama_http_compat → 2 passed
clippy + fmt clean

🤖 Generated with Claude Code

…drop-in Ollama HTTP replacement (PMAT-923) GAP (EV#6, Pillar-4 / Ollama drop-in): the realizar serve router that `apr serve` mounts (realizar::api::create_router -> create_router_with_config) exposed only the OpenAI `/v1/*` API. POSTing Ollama's native `/api/chat` or `/api/generate` hit the axum `not_found` fallback (404), so an Ollama HTTP client could not use `apr serve` as a drop-in replacement. Fix: add `crates/aprender-serve/src/api/ollama_handlers.rs` and wire `POST /api/chat` + `POST /api/generate` into create_router_with_config under the existing OpenAI-compat toggle. Both handlers translate the Ollama request into a ChatCompletionRequest and delegate generation to the SAME backend chain as `/v1/chat/completions` (openai_chat_completions_handler), then re-shape the result into Ollama's wire schema: - /api/chat: {model, created_at, message:{role,content}, done, ...} - /api/generate: {model, created_at, response, done, ...} (flat, no message) A wired route is observably distinct from the axum 404 fallback (which has no `done` field) even with no model loaded — the Ollama handler always emits a terminal (`done:true`) Ollama-shaped body, surfacing any backend error as the assistant content. Falsifier (tests/ollama_http_compat.rs, AppState::demo, no model download): - api_chat_is_routed_and_returns_ollama_shape - api_generate_is_routed_and_returns_ollama_shape RED on the unwired router (404, no message/done); GREEN once wired (200 + Ollama shape). Mutation-verified: removing the two .route() calls flips both falsifiers RED (404), restored flips them GREEN. Contract: discharges OBLIG-OLLAMA-API-CHAT-GENERATE-ROUTED in contracts/apr-serve-openai-compat-v1.yaml (v1.12.0 -> v1.13.0), with FALSIFY-OLLAMA-API-CHAT-ROUTED-923 + FALSIFY-OLLAMA-API-GENERATE-ROUTED-923 single-line test refs. pv validate + pv lint contracts/ PASS (0 errors). cargo test -p aprender-serve --lib: 15450 passed. clippy + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r serve ACTUALLY mounts (PMAT-923, reworked) The first PMAT-923 attempt wired Ollama's /api/chat + /api/generate into realizar's create_router_with_config. But `apr serve <model>` does NOT mount that router — it builds its OWN bespoke axum routers in crates/apr-cli/src/commands/serve/ (APR-CPU, CUDA-fallback, WGPU, and the single-file + sharded SafeTensors routers). So a live Ollama HTTP client still got the axum 404 fallback. The review caught this. This rework wires the Ollama endpoints at EACH of the 5 model-serving routers apr serve mounts, alongside their existing /v1/chat/completions route: - build_apr_cpu_router (handlers.rs) — the .apr CPU router - WGPU inline Router (handlers.rs / start_realizar_server) - build_gpu_router (handlers_include_01.rs) — CUDA fallback - single-file + sharded SafeTensors routers (safetensors.rs) A new apr-cli-side adapter (serve/ollama.rs) translates each Ollama request into the SAME OpenAI-chat JSON the router's existing chat handler already consumes, runs it through that router's own generation backend, and re-shapes the OpenAI response into Ollama's wire schema (/api/chat -> nested message:{role,content}+done:true; /api/generate -> flat response+done:true). GET /api/tags is also added so Ollama clients can enumerate the served model. E2E falsifier (tests/ollama_api_serve_compat.rs) drives the REAL build_apr_cpu_router (the exact router apr serve mounts for a .apr model, via a serve_test_support seam) and asserts /api/chat + /api/generate are NOT 404 and carry the Ollama wire shape. Mutation-verified: renaming the two route paths in build_apr_cpu_router flips both assertions RED (404) while the unknown_api_route_still_404s guard rail stays green. Scope is HONEST: the Ollama endpoints return a single coalesced non-streaming body today (the chat path is driven stream:false regardless of the client's stream flag). NDJSON stream:true is a documented FOLLOW-UP — this is "Ollama /api/chat + /api/generate routed on the apr serve routers + correct non-streaming wire shape", not full drop-in streaming parity. The realizar-side handlers (aprender-serve/src/api/ollama_handlers.rs) and their create_router wiring remain for any caller that DOES mount realizar's router. Contract apr-serve-openai-compat-v1.yaml bumped 1.13.0 -> 1.14.0: obligation renamed OLLAMA-API-ROUTED-ON-APR-SERVE, states the routes are on the routers apr serve mounts (verified by the apr-cli e2e), and scopes streaming honestly. cargo test -p aprender-serve --lib: 15450 passed. cargo test -p apr-cli serve:: + ollama unit + e2e: all green. pv validate + pv lint contracts/: PASS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

noahgift enabled auto-merge June 24, 2026 10:48

noahgift disabled auto-merge June 24, 2026 10:52

noahgift enabled auto-merge June 24, 2026 11:26

noahgift added this pull request to the merge queue Jun 24, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 24, 2026

noahgift added this pull request to the merge queue Jun 24, 2026

Merged via the queue into main with commit 53c9b54 Jun 24, 2026
10 checks passed

noahgift deleted the beat/ollama-api-chat-compat branch June 24, 2026 14:31

noahgift mentioned this pull request Jun 24, 2026

feat(serve): NDJSON streaming for Ollama /api/chat + /api/generate — true drop-in (PMAT-928) #2222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(serve): wire Ollama /api/chat + /api/generate so apr serve is a drop-in Ollama HTTP replacement (PMAT-923)#2216

feat(serve): wire Ollama /api/chat + /api/generate so apr serve is a drop-in Ollama HTTP replacement (PMAT-923)#2216
noahgift merged 2 commits into
mainfrom
beat/ollama-api-chat-compat

noahgift commented Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

noahgift commented Jun 24, 2026

EV#6 — Pillar-4 / Ollama drop-in (HTTP layer)

Gap

Fix

Falsifier (RED → GREEN, mutation-verified)

Contract

Verification

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant