feat(serve): wire Ollama /api/chat + /api/generate so apr serve is a drop-in Ollama HTTP replacement (PMAT-923)#2216
Merged
Merged
Conversation
…drop-in Ollama HTTP replacement (PMAT-923)
GAP (EV#6, Pillar-4 / Ollama drop-in): the realizar serve router that `apr
serve` mounts (realizar::api::create_router -> create_router_with_config)
exposed only the OpenAI `/v1/*` API. POSTing Ollama's native `/api/chat` or
`/api/generate` hit the axum `not_found` fallback (404), so an Ollama HTTP
client could not use `apr serve` as a drop-in replacement.
Fix: add `crates/aprender-serve/src/api/ollama_handlers.rs` and wire
`POST /api/chat` + `POST /api/generate` into create_router_with_config under
the existing OpenAI-compat toggle. Both handlers translate the Ollama request
into a ChatCompletionRequest and delegate generation to the SAME backend chain
as `/v1/chat/completions` (openai_chat_completions_handler), then re-shape the
result into Ollama's wire schema:
- /api/chat: {model, created_at, message:{role,content}, done, ...}
- /api/generate: {model, created_at, response, done, ...} (flat, no message)
A wired route is observably distinct from the axum 404 fallback (which has no
`done` field) even with no model loaded — the Ollama handler always emits a
terminal (`done:true`) Ollama-shaped body, surfacing any backend error as the
assistant content.
Falsifier (tests/ollama_http_compat.rs, AppState::demo, no model download):
- api_chat_is_routed_and_returns_ollama_shape
- api_generate_is_routed_and_returns_ollama_shape
RED on the unwired router (404, no message/done); GREEN once wired (200 +
Ollama shape). Mutation-verified: removing the two .route() calls flips both
falsifiers RED (404), restored flips them GREEN.
Contract: discharges OBLIG-OLLAMA-API-CHAT-GENERATE-ROUTED in
contracts/apr-serve-openai-compat-v1.yaml (v1.12.0 -> v1.13.0), with
FALSIFY-OLLAMA-API-CHAT-ROUTED-923 + FALSIFY-OLLAMA-API-GENERATE-ROUTED-923
single-line test refs. pv validate + pv lint contracts/ PASS (0 errors).
cargo test -p aprender-serve --lib: 15450 passed. clippy + fmt clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r serve ACTUALLY mounts (PMAT-923, reworked)
The first PMAT-923 attempt wired Ollama's /api/chat + /api/generate into
realizar's create_router_with_config. But `apr serve <model>` does NOT mount
that router — it builds its OWN bespoke axum routers in
crates/apr-cli/src/commands/serve/ (APR-CPU, CUDA-fallback, WGPU, and the
single-file + sharded SafeTensors routers). So a live Ollama HTTP client still
got the axum 404 fallback. The review caught this.
This rework wires the Ollama endpoints at EACH of the 5 model-serving routers
apr serve mounts, alongside their existing /v1/chat/completions route:
- build_apr_cpu_router (handlers.rs) — the .apr CPU router
- WGPU inline Router (handlers.rs / start_realizar_server)
- build_gpu_router (handlers_include_01.rs) — CUDA fallback
- single-file + sharded SafeTensors routers (safetensors.rs)
A new apr-cli-side adapter (serve/ollama.rs) translates each Ollama request
into the SAME OpenAI-chat JSON the router's existing chat handler already
consumes, runs it through that router's own generation backend, and re-shapes
the OpenAI response into Ollama's wire schema (/api/chat -> nested
message:{role,content}+done:true; /api/generate -> flat response+done:true).
GET /api/tags is also added so Ollama clients can enumerate the served model.
E2E falsifier (tests/ollama_api_serve_compat.rs) drives the REAL
build_apr_cpu_router (the exact router apr serve mounts for a .apr model, via a
serve_test_support seam) and asserts /api/chat + /api/generate are NOT 404 and
carry the Ollama wire shape. Mutation-verified: renaming the two route paths in
build_apr_cpu_router flips both assertions RED (404) while the
unknown_api_route_still_404s guard rail stays green.
Scope is HONEST: the Ollama endpoints return a single coalesced non-streaming
body today (the chat path is driven stream:false regardless of the client's
stream flag). NDJSON stream:true is a documented FOLLOW-UP — this is "Ollama
/api/chat + /api/generate routed on the apr serve routers + correct
non-streaming wire shape", not full drop-in streaming parity. The realizar-side
handlers (aprender-serve/src/api/ollama_handlers.rs) and their create_router
wiring remain for any caller that DOES mount realizar's router.
Contract apr-serve-openai-compat-v1.yaml bumped 1.13.0 -> 1.14.0: obligation
renamed OLLAMA-API-ROUTED-ON-APR-SERVE, states the routes are on the routers
apr serve mounts (verified by the apr-cli e2e), and scopes streaming honestly.
cargo test -p aprender-serve --lib: 15450 passed.
cargo test -p apr-cli serve:: + ollama unit + e2e: all green.
pv validate + pv lint contracts/: PASS.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
EV#6 — Pillar-4 / Ollama drop-in (HTTP layer)
Gap
The realizar serve router that
apr servemounts (realizar::api::create_router→create_router_with_config) exposed only the OpenAI/v1/*API. POSTing Ollama's native/api/chator/api/generatehit the axumnot_foundfallback (404), so an Ollama HTTP client could not useapr serveas a drop-in replacement at the HTTP layer.Fix
crates/aprender-serve/src/api/ollama_handlers.rsaddsollama_chat_handler+ollama_generate_handler.POST /api/chat+POST /api/generateintocreate_router_with_configunder the existing OpenAI-compat toggle (mirrors how/v1/chat/completionsis wired).ChatCompletionRequestand reuse the existing generation path (openai_chat_completions_handler, the same backend chain as/v1/chat/completions), then re-shape into Ollama's wire schema:/api/chat:{model, created_at, message:{role:"assistant", content}, done:true, prompt_eval_count, eval_count}/api/generate: flat{model, created_at, response, done:true, prompt_eval_count, eval_count}(no nested message)donefield) even with no model loaded — the handler always emits a terminaldone:trueOllama-shaped body, surfacing any backend error as the assistant content.Falsifier (RED → GREEN, mutation-verified)
crates/aprender-serve/tests/ollama_http_compat.rs(usesAppState::demo(), no model download):api_chat_is_routed_and_returns_ollama_shapeapi_generate_is_routed_and_returns_ollama_shapeRED on the unwired router (404, no
message/done); GREEN once wired (200 + Ollama shape). Mutation-verified: removing the two.route()calls flips both falsifiers RED (404), restoring flips them GREEN.Contract
Discharges OBLIG-OLLAMA-API-CHAT-GENERATE-ROUTED in
contracts/apr-serve-openai-compat-v1.yaml(v1.12.0 → v1.13.0) withFALSIFY-OLLAMA-API-CHAT-ROUTED-923+FALSIFY-OLLAMA-API-GENERATE-ROUTED-923(single-line test refs).pv validate+pv lint contracts/PASS (0 errors).Verification
cargo test -p aprender-serve --lib→ 15450 passedcargo test -p aprender-serve --test ollama_http_compat→ 2 passed🤖 Generated with Claude Code