feat: add multimedia endpoint support (image, TTS, transcription, video)#101
feat: add multimedia endpoint support (image, TTS, transcription, video)#101AlemTuzlak wants to merge 5 commits intomainfrom
Conversation
commit: |
jpr5
left a comment
There was a problem hiding this comment.
Code Review — Multimedia Endpoint Support
Well-structured PR. All 4 handlers follow consistent patterns, endpoint backfill is correct across all existing handlers, tests are strong (575 lines with specific assertions). One medium finding, two low.
Medium
Fixtures without endpoint match multimedia requests, then 500 at type guard (router.ts:44-48)
Endpoint filtering is one-directional: fixtures WITH endpoint are restricted, but fixtures WITHOUT endpoint match ANY request type. A user with a generic chat fixture:
mock.addFixture({ match: { userMessage: "guitar" }, response: { content: "Chat about guitars" } });This matches image requests for "guitar". handleImages matches it, then isImageResponse(response) fails → 500. The test only verifies the reverse direction (image fixture doesn't match chat).
Fix: when a request has _endpointType and the matched fixture has no endpoint, verify the response type is compatible with the endpoint before returning the match. Or make filtering bidirectional.
Low
extractFormField regex on binary multipart data (transcription.ts:15-22) — readBody converts binary to UTF-8 string. If file part appears before text fields, mangled bytes could theoretically match the regex. Extremely unlikely with real audio but fragile. A boundary-delimited parser would be more robust.
_endpointType not a declared field (types.ts) — stored via index signature, no type safety. Adding _endpointType?: string to ChatCompletionRequest would catch typos.
Clean
- Image gen (OpenAI + Gemini Imagen), TTS, transcription, video create/poll all correct
matchFixtureendpoint filtering works for the designed direction- Convenience methods (onImage, onSpeech, etc.) wire correctly
- Video state map with X-Test-Id isolation is correct
- Backfill of
_endpointTypeon all existing handlers is consistent
🤖 Reviewed with Claude Code
…iltering New response types (ImageResponse, AudioResponse, TranscriptionResponse, VideoResponse) with type guards. matchFixture now filters by endpoint bidirectionally: fixtures with endpoint only match that type, and multimedia requests skip generic fixtures with incompatible response types.
Image generation (OpenAI + Gemini Imagen), text-to-speech with format support, audio transcription with multipart parsing, video generation with async status polling via in-memory state map.
Register all multimedia routes in server.ts. Add onImage/onSpeech/ onTranscription/onVideo convenience methods on LLMock. Backfill _endpointType on all existing handlers (chat + embedding).
20 integration tests (image gen, TTS, transcription, video create/poll, X-Test-Id isolation, cross-matching prevention, convenience methods, endpoint backfill) + 12 unit tests for type guards and matchFixture endpoint filtering.
3d68797 to
bacfac4
Compare
New doc pages for image generation, TTS, transcription, and video. Updated fixtures page, index feature list, sidebar nav, comparison table (all competitors lack multimedia), and competitive drift detection keywords.
Summary
/v1/images/generations,/v1beta/models/{model}:predict), text-to-speech (/v1/audio/speech), audio transcription (/v1/audio/transcriptions), and video generation (/v1/videos,/v1/videos/{id})match.endpointfield toFixtureMatchfor isolating fixtures by endpoint type, preventing cross-matching (e.g., image fixtures won't match chat requests)onImage,onSpeech,onTranscription,onVideo) onLLMockand backfill_endpointTypeon all existing handlersNew Endpoints
/v1/images/generationsprompt→userMessage/v1beta/models/{model}:predictinstances[0].prompt→userMessage/v1/audio/speechinput→userMessage/v1/audio/transcriptionsmatch.endpointonly/v1/videosprompt→userMessage/v1/videos/{id}Test plan
endpoint: "chat"andendpoint: "embedding"fixtures match existing handlers