Skip to content

feat(serve): wire Ollama /api/chat + /api/generate so apr serve is a drop-in Ollama HTTP replacement (PMAT-923)#2216

Merged
noahgift merged 2 commits into
mainfrom
beat/ollama-api-chat-compat
Jun 24, 2026
Merged

feat(serve): wire Ollama /api/chat + /api/generate so apr serve is a drop-in Ollama HTTP replacement (PMAT-923)#2216
noahgift merged 2 commits into
mainfrom
beat/ollama-api-chat-compat

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

EV#6 — Pillar-4 / Ollama drop-in (HTTP layer)

Gap

The realizar serve router that apr serve mounts (realizar::api::create_routercreate_router_with_config) exposed only the OpenAI /v1/* API. POSTing Ollama's native /api/chat or /api/generate hit the axum not_found fallback (404), so an Ollama HTTP client could not use apr serve as a drop-in replacement at the HTTP layer.

Fix

  • New crates/aprender-serve/src/api/ollama_handlers.rs adds ollama_chat_handler + ollama_generate_handler.
  • Wired POST /api/chat + POST /api/generate into create_router_with_config under the existing OpenAI-compat toggle (mirrors how /v1/chat/completions is wired).
  • Both handlers translate the Ollama request → ChatCompletionRequest and reuse the existing generation path (openai_chat_completions_handler, the same backend chain as /v1/chat/completions), then re-shape into Ollama's wire schema:
    • /api/chat: {model, created_at, message:{role:"assistant", content}, done:true, prompt_eval_count, eval_count}
    • /api/generate: flat {model, created_at, response, done:true, prompt_eval_count, eval_count} (no nested message)
  • A wired route is observably distinct from the axum 404 fallback (which has no done field) even with no model loaded — the handler always emits a terminal done:true Ollama-shaped body, surfacing any backend error as the assistant content.

Falsifier (RED → GREEN, mutation-verified)

crates/aprender-serve/tests/ollama_http_compat.rs (uses AppState::demo(), no model download):

  • api_chat_is_routed_and_returns_ollama_shape
  • api_generate_is_routed_and_returns_ollama_shape

RED on the unwired router (404, no message/done); GREEN once wired (200 + Ollama shape). Mutation-verified: removing the two .route() calls flips both falsifiers RED (404), restoring flips them GREEN.

Contract

Discharges OBLIG-OLLAMA-API-CHAT-GENERATE-ROUTED in contracts/apr-serve-openai-compat-v1.yaml (v1.12.0 → v1.13.0) with FALSIFY-OLLAMA-API-CHAT-ROUTED-923 + FALSIFY-OLLAMA-API-GENERATE-ROUTED-923 (single-line test refs). pv validate + pv lint contracts/ PASS (0 errors).

Verification

  • cargo test -p aprender-serve --lib → 15450 passed
  • cargo test -p aprender-serve --test ollama_http_compat → 2 passed
  • clippy + fmt clean

🤖 Generated with Claude Code

…drop-in Ollama HTTP replacement (PMAT-923)

GAP (EV#6, Pillar-4 / Ollama drop-in): the realizar serve router that `apr
serve` mounts (realizar::api::create_router -> create_router_with_config)
exposed only the OpenAI `/v1/*` API. POSTing Ollama's native `/api/chat` or
`/api/generate` hit the axum `not_found` fallback (404), so an Ollama HTTP
client could not use `apr serve` as a drop-in replacement.

Fix: add `crates/aprender-serve/src/api/ollama_handlers.rs` and wire
`POST /api/chat` + `POST /api/generate` into create_router_with_config under
the existing OpenAI-compat toggle. Both handlers translate the Ollama request
into a ChatCompletionRequest and delegate generation to the SAME backend chain
as `/v1/chat/completions` (openai_chat_completions_handler), then re-shape the
result into Ollama's wire schema:
  - /api/chat:     {model, created_at, message:{role,content}, done, ...}
  - /api/generate: {model, created_at, response, done, ...}  (flat, no message)
A wired route is observably distinct from the axum 404 fallback (which has no
`done` field) even with no model loaded — the Ollama handler always emits a
terminal (`done:true`) Ollama-shaped body, surfacing any backend error as the
assistant content.

Falsifier (tests/ollama_http_compat.rs, AppState::demo, no model download):
  - api_chat_is_routed_and_returns_ollama_shape
  - api_generate_is_routed_and_returns_ollama_shape
RED on the unwired router (404, no message/done); GREEN once wired (200 +
Ollama shape). Mutation-verified: removing the two .route() calls flips both
falsifiers RED (404), restored flips them GREEN.

Contract: discharges OBLIG-OLLAMA-API-CHAT-GENERATE-ROUTED in
contracts/apr-serve-openai-compat-v1.yaml (v1.12.0 -> v1.13.0), with
FALSIFY-OLLAMA-API-CHAT-ROUTED-923 + FALSIFY-OLLAMA-API-GENERATE-ROUTED-923
single-line test refs. pv validate + pv lint contracts/ PASS (0 errors).

cargo test -p aprender-serve --lib: 15450 passed. clippy + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge June 24, 2026 10:48
@noahgift noahgift disabled auto-merge June 24, 2026 10:52
…r serve ACTUALLY mounts (PMAT-923, reworked)

The first PMAT-923 attempt wired Ollama's /api/chat + /api/generate into
realizar's create_router_with_config. But `apr serve <model>` does NOT mount
that router — it builds its OWN bespoke axum routers in
crates/apr-cli/src/commands/serve/ (APR-CPU, CUDA-fallback, WGPU, and the
single-file + sharded SafeTensors routers). So a live Ollama HTTP client still
got the axum 404 fallback. The review caught this.

This rework wires the Ollama endpoints at EACH of the 5 model-serving routers
apr serve mounts, alongside their existing /v1/chat/completions route:

- build_apr_cpu_router (handlers.rs) — the .apr CPU router
- WGPU inline Router (handlers.rs / start_realizar_server)
- build_gpu_router (handlers_include_01.rs) — CUDA fallback
- single-file + sharded SafeTensors routers (safetensors.rs)

A new apr-cli-side adapter (serve/ollama.rs) translates each Ollama request
into the SAME OpenAI-chat JSON the router's existing chat handler already
consumes, runs it through that router's own generation backend, and re-shapes
the OpenAI response into Ollama's wire schema (/api/chat -> nested
message:{role,content}+done:true; /api/generate -> flat response+done:true).
GET /api/tags is also added so Ollama clients can enumerate the served model.

E2E falsifier (tests/ollama_api_serve_compat.rs) drives the REAL
build_apr_cpu_router (the exact router apr serve mounts for a .apr model, via a
serve_test_support seam) and asserts /api/chat + /api/generate are NOT 404 and
carry the Ollama wire shape. Mutation-verified: renaming the two route paths in
build_apr_cpu_router flips both assertions RED (404) while the
unknown_api_route_still_404s guard rail stays green.

Scope is HONEST: the Ollama endpoints return a single coalesced non-streaming
body today (the chat path is driven stream:false regardless of the client's
stream flag). NDJSON stream:true is a documented FOLLOW-UP — this is "Ollama
/api/chat + /api/generate routed on the apr serve routers + correct
non-streaming wire shape", not full drop-in streaming parity. The realizar-side
handlers (aprender-serve/src/api/ollama_handlers.rs) and their create_router
wiring remain for any caller that DOES mount realizar's router.

Contract apr-serve-openai-compat-v1.yaml bumped 1.13.0 -> 1.14.0: obligation
renamed OLLAMA-API-ROUTED-ON-APR-SERVE, states the routes are on the routers
apr serve mounts (verified by the apr-cli e2e), and scopes streaming honestly.

cargo test -p aprender-serve --lib: 15450 passed.
cargo test -p apr-cli serve:: + ollama unit + e2e: all green.
pv validate + pv lint contracts/: PASS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge June 24, 2026 11:26
@noahgift noahgift added this pull request to the merge queue Jun 24, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 24, 2026
@noahgift noahgift added this pull request to the merge queue Jun 24, 2026
Merged via the queue into main with commit 53c9b54 Jun 24, 2026
10 checks passed
@noahgift noahgift deleted the beat/ollama-api-chat-compat branch June 24, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant