feat(wan22): add WAN 2.2 text-to-video adapter and dataset for MLPerf inference by wu6u3tw · Pull Request #293 · mlcommons/endpoints

wu6u3tw · 2026-04-22T22:25:15Z

Summary

Adds a new wan22 module with Wan22Adapter, Wan22Accumulator, Wan22Dataset, and Pydantic wire types for the trtllm-serve POST /v1/videos/generations endpoint.
Wan22Adapter uses response_format=video_path: the server saves the encoded video to shared storage (Lustre) and returns only the file path, avoiding 3–5 MB of base64 video bytes per request over
HTTP and ZMQ transport.
Wan22Dataset loads MLPerf WAN2.2 prompt text files (one prompt per line); dataset ID wan22_mlperf is registered with DataLoaderFactory for --dataset CLI use.
Registers APIType.WAN22 and wires Wan22Adapter/Wan22Accumulator into HTTPClientConfig.
Adds an example offline benchmark YAML for Lyris (GB200/GB300) targeting a local trtllm-serve instance with MLPerf-standard params (720×1280, 5 s, 20 steps, guidance 4.0/3.0, seed 42, 248 prompts).
Fixes with_updates() to reset adapter and accumulator when api_type changes.

Test plan

pytest -m unit tests/unit/wan22/ — adapter, dataset, factory, types, init, registration unit tests
pytest -m integration tests/integration/wan22/ — adapter round-trip with mock server
pre-commit run --all-files passes clean
Offline benchmark runs end-to-end against a live trtllm-serve instance on Lyris (manual)

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2026-04-22T22:26:00Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces support for the WAN2.2 MLPerf text-to-video benchmark, including a new adapter, dataset loader, and associated Pydantic models. It also adds comprehensive documentation and an example configuration for running benchmarks on Lyris. The review feedback identifies several critical omissions and inconsistencies: the VideoPathRequest and Wan22Dataset are missing the latent_path field required for MLPerf reproducibility, and there is a mismatch between the adapter implementation and unit tests regarding the response_format and handling of VideoPayloadResponse. Additionally, the feedback suggests using None as a default for negative_prompt to allow server-side defaults and injecting the canonical MLPerf negative prompt into the dataset.

…and tests

…er, Wan22Dataset→VideoGenDataset - Rename src/inference_endpoint/wan22/ → videogen/ - Rename tests/unit/wan22/ → tests/unit/videogen/ - Rename tests/integration/wan22/ → tests/integration/videogen/ - APIType.WAN22 → APIType.VIDEOGEN ("wan22" → "videogen") - Wan22Adapter → VideoGenAdapter - Wan22Dataset → VideoGenDataset - Wan22Accumulator → VideoGenAccumulator - Update all imports, maps, __all__, tests, docs, and example yaml - Keep dataset_id="wan22_mlperf" and model_params.name="wan22" (MLPerf identifiers)

…bytes) encode_query: response_format defaults to "video_bytes" but can be overridden via query.data["response_format"] = "video_path" for Lustre-path mode. decode_response: dispatches on response shape — "video_bytes" key → VideoPayloadResponse, otherwise → VideoPathResponse.

wu6u3tw · 2026-04-30T02:37:00Z

@arekay-nv Quick Q: We don't want to have this jsonl file in the PR, right?

Here should be fine given that it is relatively small.

arekay-nv

Made a quick pass - will make another later.

arekay-nv · 2026-04-30T21:32:22Z

Revert this file to pass pre-commit.

arekay-nv · 2026-04-30T21:33:16Z

+
+settings:
+  runtime:
+    min_duration_ms: 60000 # 1 minute warm-up


This isn't the warmup duration as this is counted in the performance runs.

arekay-nv · 2026-04-30T21:48:17Z

This shouldn't be needed. Alternatively what would be nice to have here is instructions on which model (HF link) to use and how to launch a server instance as well as the benchmark.

arekay-nv · 2026-04-30T21:48:51Z

Here should be fine given that it is relatively small.

arekay-nv · 2026-04-30T21:49:36Z

Revert this file as well - likely related to the original datasets folder.

arekay-nv · 2026-04-30T21:50:23Z

arekay-nv · 2026-04-30T21:58:56Z

+import pytest
+from aiohttp import web
+
+DUMMY_VIDEO_PATH = "/lustre/videos/mock_video_001.mp4"


Use temp dir fixture.

arekay-nv · 2026-04-30T22:52:58Z

+
+    @classmethod
+    def decode_sse_message(cls, json_bytes: bytes) -> str:
+        raise NotImplementedError("WAN 2.2 does not use SSE streaming")


Use model name or adapter name here - might be run with a different model.

arekay-nv · 2026-04-30T22:53:25Z

+    WAN 2.2 uses non-streaming HTTP. This class exists only to satisfy
+    the HTTPClientConfig.accumulator type contract.
+    """
+
+    def __init__(self, query_id: str, stream_all_chunks: bool) -> None:
+        self.query_id = query_id
+        # stream_all_chunks is intentionally ignored: WAN 2.2 is non-streaming.


Same - not use Wan2.2.

arekay-nv · 2026-04-30T22:59:32Z

+DUMMY_VIDEO_BYTES = b"\x00\x00\x00\x20ftypmp42" + b"\x00" * 24
+
+
+class MockTrtllmServe:


Is it possible to reuse the existing echo server with a new video-gen route. That will keep things simple and reduce replication.

I hierarchy the echoserver to this MockTrtllmServe class. Let me know if it is okay.

Copilot

Pull request overview

Adds WAN 2.2 text-to-video (trtllm-serve) support to the inference-endpoint client by introducing a new videogen module (adapter + wire types + dataset) and wiring it into the existing endpoint client + dataset loader factory.

Changes:

Introduce APIType.VIDEOGEN with VideoGenAdapter/VideoGenAccumulator and Pydantic request/response wire models for POST /v1/videos/generations.
Add VideoGenDataset registered as predefined dataset wan22_mlperf, plus unit/integration tests and example offline benchmark config/scripts.
Update dataset factory + templates/docs to recognize the new workload.

Reviewed changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
uv.lock	Bumps `transformers` and adds `videogen` extras to lock metadata.
pyproject.toml	Adds a `videogen` optional dependency group (currently empty).
src/inference_endpoint/videogen/types.py	Adds Pydantic wire models for video generation request/response.
src/inference_endpoint/videogen/adapter.py	Adds HTTP adapter + no-op accumulator for non-streaming video endpoint.
src/inference_endpoint/videogen/dataset.py	Adds predefined dataset `wan22_mlperf` (prompt text file loader).
src/inference_endpoint/videogen/init.py	Exposes videogen public API symbols.
src/inference_endpoint/core/types.py	Registers `APIType.VIDEOGEN` and default route `/v1/videos/generations`.
src/inference_endpoint/endpoint_client/config.py	Wires adapter/accumulator into `ADAPTER_MAP`/`ACCUMULATOR_MAP`.
src/inference_endpoint/dataset_manager/factory.py	Modifies predefined dataset loader invocation to pass `path=`.
src/inference_endpoint/dataset_manager/init.py	Imports `VideoGenDataset` so it registers into `Dataset.PREDEFINED`.
src/inference_endpoint/dataset_manager/dataset.py	Adds mypy ignore on `datasets` imports.
src/inference_endpoint/dataset_manager/predefined/shopify_product_catalogue/init.py	Adds mypy ignore on `datasets` import.
src/inference_endpoint/evaluation/livecodebench/generate.py	Adds mypy ignore on `datasets` import.
src/inference_endpoint/config/templates/online_template_full.yaml	Updates documented `api_type` options to include `videogen`.
src/inference_endpoint/config/templates/offline_template_full.yaml	Updates documented `api_type` options to include `videogen`.
src/inference_endpoint/config/templates/concurrency_template_full.yaml	Updates documented `api_type` options to include `videogen`.
tests/unit/videogen/test_types.py	Unit tests for videogen Pydantic wire models.
tests/unit/videogen/test_adapter.py	Unit tests for adapter encode/decode behavior and accumulator contract.
tests/unit/videogen/test_dataset.py	Unit tests for dataset loading + sample shaping.
tests/unit/videogen/test_factory.py	Unit tests asserting factory can create the videogen predefined dataset.
tests/unit/videogen/test_registration.py	Unit tests for enum + adapter/accumulator registration.
tests/unit/videogen/test_init.py	Unit tests for videogen module public exports.
tests/unit/videogen/init.py	Test package init (licensing header).
tests/integration/videogen/conftest.py	Adds aiohttp mock trtllm-serve fixtures for adapter integration tests.
tests/integration/videogen/test_adapter.py	Integration tests for encode→POST→decode round-trip and error cases.
tests/integration/videogen/init.py	Test package init (licensing header).
examples/09_Wan22_VideoGen_Example/offline_wan22.yaml	Example offline benchmark config targeting videogen endpoint.
examples/09_Wan22_VideoGen_Example/setup_and_test.sh	Example script to set up venv and run videogen tests.
examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl	Bundled 248-prompt dataset artifact for the example benchmark.
endpoints_changed.md	Design summary / documentation for WAN2.2 videogen integration.
AGENTS.md	Adds videogen module to repo architecture documentation.
.gitignore	Ignores `.worktrees/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-05-01T02:03:48Z

+        # exclude_none so optional fields fall back to server-side defaults
+        # (MLPerf: omit negative_prompt and latent_path unless explicitly set).


The comment here says “MLPerf: omit negative_prompt and latent_path unless explicitly set”, but this PR’s VideoGenDataset injects the MLPerf canonical negative_prompt by default (i.e., it is explicitly set for the common path). Consider rewording this comment to reflect the actual behavior: fields are omitted only when their value is None in query.data.

Suggested change

# exclude_none so optional fields fall back to server-side defaults

# (MLPerf: omit negative_prompt and latent_path unless explicitly set).

# exclude_none so optional fields with value None fall back to

# server-side defaults. In particular, negative_prompt and

# latent_path are omitted only when their value in query.data is None.

Copilot · 2026-05-01T02:03:48Z

            return ds_cls.get_dataloader(
                transforms=preset_transforms,
                num_repeats=num_repeats,
+                path=dataset_path,
                **kwargs,


DataLoaderFactory now always passes path=dataset_path into ds_cls.get_dataloader(...) for all predefined datasets. Most predefined datasets use the base Dataset.get_dataloader(), which forwards **kwargs into cls.generate(...); many generate() implementations (e.g. AIME25/CNNDailyMail/GPQA/OpenOrca) do not accept a path kwarg, so this will raise TypeError: generate() got an unexpected keyword argument 'path' whenever those datasets are loaded. Consider only passing path for dataset classes that explicitly accept it (e.g. VideoGenDataset), or mapping config.path to the existing datasets_dir parameter instead of introducing a new kwarg.

Copilot · 2026-05-01T02:03:49Z

+datasets:
+  - name: wan22_prompts
+    path: examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl
+    format: jsonl
+    type: "performance"
+    samples: 248
+


This YAML says WAN2.2 MLPerf params are baked into VideoGenAdapter defaults and “only prompt is required from the dataset”, but the configured dataset is a JSONL file that (as provided) includes extra fields like negative_prompt/mode. Those fields will be forwarded into query.data and can override adapter defaults (e.g., empty negative_prompt). Consider either switching the example to wan22_mlperf (text prompts) so the adapter truly only receives prompt, or trimming the JSONL to only the fields you intend to send.

Copilot · 2026-05-01T02:03:49Z

+# Fixed latent path is injected into every request via VideoGenDataset.
+# Set latent_path in dataset config or pass it when constructing VideoGenDataset.


These comments state that a fixed latent path is injected “via VideoGenDataset”, but this example configuration does not use the predefined wan22_mlperf/VideoGenDataset loader (it loads a JSONL file directly). As a result, latent_path will not be injected unless it is explicitly present in the JSONL or added via a transform. Update the comments (or the dataset config) so the example reflects the actual injection mechanism.

Suggested change

# Fixed latent path is injected into every request via VideoGenDataset.

# Set latent_path in dataset config or pass it when constructing VideoGenDataset.

# This example loads prompts directly from JSONL and does not use

# `VideoGenDataset`, so `latent_path` is not injected automatically here.

# If required, include `latent_path` in the JSONL or add it via a transform.

Copilot · 2026-05-01T02:03:49Z

+        return web.json_response(
+            {
+                "video_id": video_id,
+                "video_bytes": base64.b64encode(DUMMY_VIDEO_BYTES).decode(),
+            }
+        )
+
+


MockTrtllmServe claims to support both response_format='video_path' and 'video_bytes', but _handle_sync() always returns a payload containing video_bytes and never returns video_path (nor does it inspect the request’s response_format). This makes the integration suite unable to validate the adapter’s video_path decode branch and can mask regressions. Update the handler to branch on body.get('response_format') and return video_path when requested.

Suggested change

return web.json_response(

{

"video_id": video_id,

"video_bytes": base64.b64encode(DUMMY_VIDEO_BYTES).decode(),

}

)

response_format = body.get("response_format")

response_body = {"video_id": video_id}

if response_format == "video_path":

response_body["video_path"] = DUMMY_VIDEO_PATH

else:

response_body["video_bytes"] = base64.b64encode(DUMMY_VIDEO_BYTES).decode()

return web.json_response(response_body)

Copilot · 2026-05-01T02:03:50Z

+videogen/
+├── __init__.py     Public exports
+├── types.py        Pydantic wire models: VideoPathRequest,
+│                   VideoPathResponse, VideoPayloadResponse, HealthResponse


The “New Files” section lists a HealthResponse type in videogen/types.py, but no such model is implemented/exported in this PR. Either add the missing type or remove it from the document to keep the doc accurate.

Suggested change

│ VideoPathResponse, VideoPayloadResponse, HealthResponse

│ VideoPathResponse, VideoPayloadResponse

After rebasing onto origin/main, these files needed updates from: - ruff lint/format (test imports, line wrapping) - prettier (markdown table formatting, YAML alignment) - regenerate-templates (api_type docstring now lists "videogen") - uv lock refresh (pyproject.toml now has "videogen" optional group) Also tightens an over-broad pytest.raises(Exception) in the videogen integration tests to (ValidationError, json.JSONDecodeError) for the 500-response case and json.JSONDecodeError for the malformed-JSON case (B017).

…t_path Address MR review feedback (Task 3 of wan22-trtllm-plan.md): types.py: - VideoPathRequest.negative_prompt: str = "" -> str | None = None - Add VideoPathRequest.latent_path: str | None = None field (per-request fixed latent tensor path for MLPerf reproducibility) adapter.py: - encode_query: read negative_prompt and latent_path with .get() (no default), and serialise with model_dump_json(exclude_none=True) so optional fields fall back to server-side defaults when absent. dataset.py: - Add _MLPERF_NEGATIVE_PROMPT module constant (canonical MLPerf string). - VideoGenDataset injects this negative prompt into every sample by default; pass negative_prompt=None to omit. Accepts latent_path as a per-dataset config so all samples share the same fixed latent. - load() conditionally includes negative_prompt and latent_path in each sample dict only when set, so adapter exclude_none does the right thing end-to-end. Tests: - Update test_types defaults (negative_prompt None, latent_path None) - Update test_dataset for the canonical negative-prompt default, add coverage for negative_prompt=None and latent_path propagation. - Add adapter tests for exclude_none behaviour and latent_path forwarding. Note for reviewer: the other two review comments (response_format hardcoded to video_path; decode_response only handling VideoPathResponse) are stale; both were addressed in 7a1b4d3 "fix: make response_format optional in VideoGenAdapter (default video_bytes)". Current adapter already defaults response_format to video_bytes and dispatches in decode_response on whether "video_bytes" is present in the response.

Aligns three previously-inconsistent statements about the request default: - Adapter `encode_query` previously fell back to "video_bytes" when query.data did not specify a response_format, but the Pydantic field default on VideoPathRequest was "video_path" — the latter was dead because the adapter always supplied a value. - Dataset docstring claimed "always requests video_bytes"; types docstring described a perf/accuracy split. Pick the perf/accuracy split (Option A): default = video_path (perf), opt-in to video_bytes via query.data["response_format"] (accuracy). - adapter.py: flip `data.get("response_format", ...)` default to "video_path"; rewrite class + encode_query docstrings to match. - dataset.py: drop the "always requests video_bytes" line. - test_adapter.py (unit + integration): split the old test_encode_query_always_requests_video_bytes test into default-is-video_path + accuracy-mode-override tests. Also rewrite endpoints_changed.md: - Replace the "always video_path" framing with the dual-mode reality. - Document VideoPayloadResponse and the decode_response shape dispatch. - Fix the payload-size claim (300 MB -> 3-5 MB; 300 MB was raw uncompressed). - Drop stale "Pending" tasks 2 (latent_path -- already wired) and 3 (negative_prompt None -- already done). - Update module name `wan22` -> `videogen` and `api_type` example. Verified on aarch64 GB200: 58 unit+integration videogen tests pass.

…wan22 refs Move datasets/wan22_prompts.jsonl into the example folder so the example is self-contained and drop the absolute Lustre path baked into the setup script. setup_and_test.sh: - Remove PROMPTS_TXT (hardcoded /lustre/share/... path) and the entire prompts.txt -> JSONL conversion block. The JSONL is now bundled with the example, so regeneration from a Lustre source is no longer needed. - Retarget PROMPTS_JSONL to ${SCRIPT_DIR}/wan22_prompts.jsonl. - Drop the now-orphaned PYTHON variable (only used by the conversion heredoc). - Fix stale post-rename references that were left over from ddac990 (wan22 -> videogen): pip extras [wan22,test] -> [videogen,test] and test paths tests/unit/wan22 / tests/integration/wan22 -> tests/unit/videogen / tests/integration/videogen. Without these the script failed on a fresh setup (no [wan22] extra) and collected zero tests. offline_wan22.yaml: dataset path -> examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl. endpoints_changed.md: update bundled-dataset path reference.

…ader VideoGenDataset duplicated functionality the generic JsonlLoader already provides, was bugged on JSONL input (read each line as a raw text prompt instead of parsing JSON), and wasn't actually invoked by the example config: offline_wan22.yaml uses `name: wan22_prompts`, which doesn't match its dataset_id (`wan22_mlperf`), so DataLoaderFactory already routed to JsonlLoader. The class was dead code in the only path the example exercises. Bake the MLPerf canonical negative_prompt into every row of the bundled JSONL so runtime injection is unnecessary, then delete the workload- specific dataset class. - examples/09_Wan22_VideoGen_Example/wan22_prompts.jsonl: replace empty negative_prompt with canonical MLPerf string in all 248 rows. - src/inference_endpoint/videogen/dataset.py: deleted. - src/inference_endpoint/dataset_manager/__init__.py: drop the side-effect VideoGenDataset import and __all__ entry. - tests/unit/videogen/test_dataset.py, test_factory.py: deleted. - src/inference_endpoint/videogen/types.py: update negative_prompt field docstring to point at the bundled JSONL instead of VideoGenDataset. - examples/09_Wan22_VideoGen_Example/offline_wan22.yaml: drop `format: jsonl` (the factory tries DatasetFormat("jsonl") and crashes because enum values are `.jsonl`; auto-detection from the path extension works), and update the comment block. - endpoints_changed.md: replace the dataset.py / VideoGenDataset section with a brief note about the bundled JSONL + JsonlLoader. Verified on aarch64 GB200: - pre-commit run --all-files: all hooks pass - 42 videogen tests pass (down from 58: 14 dataset tests + 3 factory tests removed; adapter/types/init/registration tests retained) - end-to-end smoke: DataLoaderFactory creates a Dataset with 248 samples, each carrying prompt + canonical negative_prompt; VideoGenAdapter.encode_query produces a valid request with response_format=video_path.

The `metrics:` top-level block was rejected by `BenchmarkConfig` (extra="forbid", no `metrics` field in the schema), so loading the YAML via `BenchmarkConfig.from_yaml_file()` failed validation. The block had no effect on metrics collection — that's controlled by `settings.runtime` and the metrics aggregator service. Caught by an end-to-end functional smoke test that loads the YAML, runs encode → POST → decode against an inline mock trtllm-serve in both perf (video_path) and accuracy (video_bytes) modes, and bulk- encodes all 248 samples.

… simplify adapter, broaden tests Five small fixes from a pre-review pass: - factory.py: drop `path=dataset_path` kwarg on the predefined-dataset branch. It was added in 9ded0e6 to feed VideoGenDataset (since deleted in 6ce6bfa); none of the remaining predefined datasets' generate() signatures accept `path`, so any user passing both `name=<predefined>` and `path=<file>` would TypeError. Restores the pre-9ded0e6 behavior. Verified the videogen example still loads end-to-end (it routes through Dataset.load_from_file, not the predefined branch). - adapter.py encode_query: replace the 12-line data.get() boilerplate with `VideoPathRequest.model_validate({k: data[k] for k in known if k in data})`. Pydantic applies defaults, eliminating the drift risk between adapter.py and types.py. Extra keys in query.data (sample_id, sample_index, mode, ...) are now ignored cleanly. - integration mock: MockTrtllmServe._handle_sync now honors body["response_format"] and routes to a VideoPathResponse when the request asks for video_path. Previously the mock always returned video_bytes regardless of the request, so the integration tests never exercised the perf-mode decode branch end-to-end. - integration tests: add test_perf_mode_round_trip_returns_video_path asserting result.metadata == {video_path: ...} via real HTTP. Rename the renamed counterpart to test_accuracy_mode_round_trip_returns_video_bytes. Replace the misleadingly named test_missing_video_bytes_field_raises_validation_error with two targeted tests, one per decode dispatch branch. - endpoints_changed.md: drop two stale references — `HealthResponse` (never existed in types.py) and `APIType.WAN22` (the actual enum is APIType.VIDEOGEN). Trim the doc from 155 → 84 lines (~46% reduction) by collapsing the architecture diagram and removing repetition; factual content is unchanged. Verified on aarch64 GB200: - pre-commit run --all-files: all hooks pass - 43 videogen tests pass (one new perf-mode round-trip test added) - end-to-end smoke (load YAML → factory → JsonlLoader → adapter → mock server) passes both perf and accuracy modes; encode_query ignores extra keys cleanly

- AGENTS.md: drop stale HealthResponse, removed VideoGenDataset row; rephrase Key Components blurb to be model-agnostic. - Delete endpoints_changed.md per author note (kept internally). - schema.py / probe.py: api_type help now lists videogen alongside openai and sglang; regenerated full-template YAMLs accordingly. - offline_wan22.yaml: rewrite the comment block to match the bundled JSONL contents; drop misleading min_duration_ms warm-up annotation. - adapter.py: clarify that exclude_none falls back only when the query.data value is None, not unconditionally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Replace WAN 2.2 references in module docstrings, error messages, and comments with model-agnostic wording (the adapter can serve other video-generation models behind the same trtllm-serve route). - Trim wan22_prompts.jsonl to prompt + canonical MLPerf negative_prompt; drop unused mode/sample_id/sample_index columns the loader synthesises. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tray type-ignores - conftest.py: replace hardcoded /lustre/.../mock_video_001.mp4 with a tmp_path_factory-driven fixture; thread it through the integration test. - pyproject.toml: drop the empty videogen optional-dependencies extra; uv.lock regenerated to match. - Revert three unrelated # type: ignore[attr-defined] additions on the datasets imports (dataset.py, shopify, livecodebench) — out of scope. - setup_and_test.sh: replace the test-runner-only script with a brief end-to-end runbook (HF download, server launch hint, benchmark). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…use it - Extract route registration in EchoServer into _register_routes(app) so subclasses can swap the OpenAI-shaped routes for a different wire contract while reusing the background-thread aiohttp lifecycle. - Convert MockTrtllmServe and MockTrtllmServeError into EchoServer subclasses; drops ~80 lines of duplicated start/stop/port plumbing from tests/integration/videogen/conftest.py. Addresses arekay-nv's review comment on conftest.py asking whether the existing echo server can be reused with a video-gen route. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

wu6u3tw · 2026-05-05T23:26:10Z

        asyncio.set_event_loop(self._loop)
        self._loop.run_until_complete(self._start_server())

+    def _register_routes(self, app: "web.Application") -> None:


Add this to fit the test in videogen.

arekay-nv

Review Council — Multi-AI Code Review

Reviewed by: Claude (Codex run failed: workspace-managed policies error during cloud-requirements load).

Depth: thorough (PR is +1244/-24 across 25 files).

Found 12 issues — 5 medium, 7 low. No critical/high. See inline comments for details and the summary table in the follow-up comment.

arekay-nv · 2026-05-06T05:29:46Z

+        - "video_path" response: metadata["video_path"] holds the Lustre file path.
+        """
+        raw = json.loads(response_bytes)
+        if "video_bytes" in raw:


[Claude] medium (bug): Dispatch on key presence is brittle. if "video_bytes" in raw: accepts the video_bytes branch even when the value is null, but VideoPayloadResponse.video_bytes is typed str (not str | None) so model_validate then raises ValidationError and the worker fails the query. This combines with #4 below: a server reply that includes video_bytes legitimately omitted (set to null instead of dropped) crashes a query that would otherwise be valid via the video_path shape. Suggest dispatching on truthiness, e.g. if isinstance(raw.get("video_bytes"), str):, or modelling the response as a pydantic.discriminator union so the framework picks the right shape.

arekay-nv · 2026-05-06T05:29:46Z

+                f"'prompt' not found in query.data keys: {list(data.keys())}"
+            )
+        known = VideoPathRequest.model_fields.keys()
+        req = VideoPathRequest.model_validate({k: data[k] for k in known if k in data})


[Claude] low (data-integrity): Silent drop of unknown keys. The {k: data[k] for k in known if k in data} filter silently discards any field absent from VideoPathRequest.model_fields — useful for sample_id/sample_index, but a typo like negitive_prompt or inference_steps is swallowed and the server falls back to its own default, quietly invalidating MLPerf submissions. The docstring at L59 already calls out that extra keys are ignored, but doesn't distinguish intended extras from typos. Consider either: (a) maintaining an explicit allowlist of pass-through keys in dataset_transforms (via ColumnFilter) so unknown keys never reach encode_query, or (b) at minimum logging at DEBUG when keys are dropped.

arekay-nv · 2026-05-06T05:29:46Z

+
+    @classmethod
+    def dataset_transforms(cls, model_params: "ModelParams") -> "list[Transform]":
+        return []


[Claude] medium (design): Missing ColumnFilter violates the abstract-base contract. HttpRequestAdapter.dataset_transforms (adapter_protocol.py:50-58) explicitly requires every adapter to include a ColumnFilter so unused dataset columns don't cross the wire as part of query.data. Returning [] here means every column in wan22_prompts.jsonl (and any future user-supplied dataset) is forwarded verbatim into Query.data, then encoded into the ZMQ payload, even though encode_query only consumes a known subset. Add a ColumnFilter(required_columns=["prompt"], optional_columns=list(VideoPathRequest.model_fields.keys())) (mirroring how the OpenAI/SGLang adapters wire their filters) — this also closes the typo-silently-dropped path from #2.

arekay-nv · 2026-05-06T05:29:46Z

+        return None
+
+    def get_final_output(self) -> QueryResult:
+        return QueryResult(id=self.query_id)


[Claude] medium (bug): Streaming path silently produces empty results. VideoGenAccumulator.get_final_output() returns QueryResult(id=self.query_id) with no response_output and no metadata. If anything routes a videogen query through the streaming branch (worker.py:415 _handle_streaming_body, triggered when query.data["stream"] is truthy), the chain is: parse_sse_chunk swallows the NotImplementedError from decode_sse_message (adapter_protocol.py:130-132 has a bare except Exception: pass), add_chunk returns None, and the final QueryResult is empty — the benchmark then reports the query as successful with zero output. Either raise loudly here (e.g. raise RuntimeError("VideoGenAdapter does not support streaming")) or guard encode_query to reject stream=True so the failure surfaces upstream with a clear message.

arekay-nv · 2026-05-06T05:29:46Z

        APIType,
-        cyclopts.Parameter(alias="--api-type", help="API type: openai or sglang"),
+        cyclopts.Parameter(
+            alias="--api-type", help="API type: openai, sglang, or videogen"


[Claude] medium (bug): Probe is unusable for videogen. Adding videogen to the --api-type help text exposes the probe path to video generation, but probe_timeout = 60.0 (probe.py:153, outside this hunk so flagging here at the new help text) is hardcoded to 60s for all --requests (default 10). WAN2.2-T2V-A14B inference takes minutes per request on production hardware, so --api-type videogen will always exit with Probe failed: only 0/10 requests successful, which is misleading. Either scale the timeout per api_type (e.g. videogen → several minutes plus a default of --requests 1), or have the probe command refuse videogen upfront with a clear message pointing users at a separate health check.

arekay-nv · 2026-05-06T05:29:47Z

+            resp = VideoPayloadResponse.model_validate(raw)
+            return QueryResult(
+                id=query_id,
+                response_output=TextModelOutput(output=resp.video_id),


[Claude] low (design): Abstraction leak: video_id packed into TextModelOutput. Both decode branches (L86 here, L92 below) do response_output=TextModelOutput(output=resp.video_id). TextModelOutput represents textual model output; video_id is a server-side handle, conceptually metadata. Downstream consumers calling result.get_response_output_string() (e.g. probe.py:195 prints it as the response text) will surface the id as if it were generated content. A cleaner shape: leave response_output=None (this is a non-textual response) and put video_id into metadata alongside video_path/video_bytes. Functional today, but a future-proofing concern as more video metrics get added.

arekay-nv · 2026-05-06T05:29:47Z

+
+    async def _handle_videogen(self, request: web.Request) -> web.Response:
+        body = await request.json()
+        video_id = f"mock_video_{hash(body.get('prompt', '')) & 0xFFFF:04x}"


[Claude] low (testing): Non-deterministic mock id via salted hash(). hash(body.get('prompt', '')) is salted per-interpreter via PYTHONHASHSEED, so the resulting mock_video_xxxx id is different on every test run. Today nothing asserts on a specific id so it doesn't fail, but it makes the mock harder to reason about, prevents using a snapshot fixture for video paths/ids, and makes failures across CI runs harder to compare. Prefer a stable derivation, e.g. hashlib.sha1(prompt.encode()).hexdigest()[:4].

arekay-nv · 2026-05-06T05:29:47Z

+        with pytest.raises(KeyError):
+            VideoGenAdapter.encode_query(query)
+
+    def test_decode_response_returns_video_bytes_in_metadata(self):


[Claude] low (testing): Missing unit coverage for the perf-mode (video_path) decode branch. test_decode_response_returns_video_bytes_in_metadata (this test) and test_decode_response_video_id_in_output (L111) only exercise the VideoPayloadResponse path, but the default perf-mode response is VideoPathResponse and is the shape that runs in production benchmarks. Today it's only covered by the integration round-trip, which won't catch regressions in decode_response if the integration suite is skipped. Add a parallel unit test that builds a VideoPathResponse(video_id=..., video_path=...), encodes to JSON bytes, calls VideoGenAdapter.decode_response, and asserts metadata == {"video_path": ...} and response_output.output == video_id.

arekay-nv · 2026-05-06T05:29:47Z

+
+    @classmethod
+    def decode_sse_message(cls, json_bytes: bytes) -> str:
+        raise NotImplementedError("VideoGenAdapter does not use SSE streaming")


[Claude] low (error-handling): NotImplementedError is silently swallowed. HttpRequestAdapter.parse_sse_chunk (adapter_protocol.py:126-132) wraps every call to decode_sse_message in a bare try / except Exception: pass, with the comment "Normal for non-content SSE messages (role, finish_reason, etc)". That catch-all consumes NotImplementedError too, so if anything ever feeds a videogen request through the SSE path, this raise becomes invisible and the worker quietly produces an empty result (see #4 above). Either raise a more specific exception class that the base method re-raises, or add an is_streaming guard at the worker level that refuses to enter the streaming branch when cls.decode_sse_message is HttpRequestAdapter.decode_sse_message (i.e. unimplemented).

arekay-nv · 2026-05-06T05:29:47Z

+
+model_params:
+  name: "wan22"
+  max_new_tokens: 0 # Video generation does not produce tokens


[Claude] low (design): max_new_tokens: 0 is an api_type=videogen-only convention. This is harmless today because VideoGenAdapter.dataset_transforms ignores model_params, but ModelParams.max_new_tokens has no ge>0 validator and is forwarded to OpenAI/SGLang adapters as max_completion_tokens=0. If a user copies this YAML as a starting point and changes api_type to openai/sglang (a natural debugging step) they'll get a 400 from the server with no clear pointer back to this field. Either set this to a more neutral default (e.g. 1) and add a comment noting videogen ignores max_new_tokens entirely, or move the value into a videogen-specific config section so the api_type coupling is explicit.

arekay-nv · 2026-05-06T05:29:55Z

Review Council — Multi-AI Code Review

Reviewed by: Claude | Depth: thorough

Codex review failed at the cloud-requirements / workspace-managed-policies load step; falling back to Claude-only.

Found 12 issues across 6 files.

🟡 Should Fix (medium)

Real issues that trigger under specific conditions or design flaws that will compound.

#	File	Line	Category	Summary
1	`src/inference_endpoint/videogen/adapter.py`	82	bug	Dispatch on key presence is brittle
2	`src/inference_endpoint/videogen/adapter.py`	49	design	Missing `ColumnFilter` violates the abstract-base contract
3	`src/inference_endpoint/videogen/adapter.py`	116	bug	Streaming path silently produces empty results
4	`src/inference_endpoint/commands/probe.py`	54	bug	Probe is unusable for videogen
5	`src/inference_endpoint/videogen/types.py`	74	api-contract	Response models silently accept shape drift

🔵 Consider (low)

Valid improvements that could be follow-ups.

#	File	Line	Category	Summary
6	`src/inference_endpoint/videogen/adapter.py`	67	data-integrity	Silent drop of unknown keys
7	`src/inference_endpoint/videogen/adapter.py`	36	design	Docstring suggests behaviour that doesn't exist
8	`src/inference_endpoint/videogen/adapter.py`	86	design	Abstraction leak: `video_id` packed into `TextModelOutput`
9	`tests/integration/videogen/conftest.py`	65	testing	Non-deterministic mock id via salted `hash()`
10	`tests/unit/videogen/test_adapter.py`	100	testing	Missing unit coverage for the perf-mode (`video_path`) decode branch
11	`src/inference_endpoint/videogen/adapter.py`	98	error-handling	`NotImplementedError` is silently swallowed
12	`examples/09_Wan22_VideoGen_Example/offline_wan22.yaml`	26	design	`max_new_tokens: 0` is an `api_type=videogen`-only convention

Note on duplicates: Two candidate issues were dropped during dedupe — one on wan22_prompts.jsonl (negative_prompt duplication; @arekay-nv already accepted the file as-is) and one on setup_and_test.sh (run-order; already addressed in @wu6u3tw's Modified reply on @arekay-nv's earlier comment).

wu6u3tw requested a review from a team April 22, 2026 22:25

wu6u3tw self-assigned this Apr 22, 2026

wu6u3tw marked this pull request as draft April 22, 2026 22:25

wu6u3tw requested review from arekay-nv, nv-alicheng and nvzhihanj April 22, 2026 22:25

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread src/inference_endpoint/videogen/types.py Outdated

Comment thread src/inference_endpoint/videogen/adapter.py Outdated

Comment thread src/inference_endpoint/videogen/adapter.py

Comment thread src/inference_endpoint/videogen/dataset.py Outdated

wu6u3tw force-pushed the feat/wan22 branch 2 times, most recently from b0878b5 to 3230fec Compare April 23, 2026 22:22

wu6u3tw force-pushed the feat/wan22 branch from c15b492 to e1520b4 Compare April 30, 2026 01:59

wu6u3tw marked this pull request as ready for review April 30, 2026 02:24

wu6u3tw force-pushed the feat/wan22 branch from 58cb045 to 6383be3 Compare April 30, 2026 02:28

wu6u3tw added 5 commits April 29, 2026 19:32

chore: ignore .worktrees/ directory

8129aee

feat(wan22): add WAN 2.2 text-to-video adapter, dataset, wire types, …

9ded0e6

…and tests

chore(wan22): add MLPerf WAN2.2 prompt dataset (248 samples)

a9d1703

wu6u3tw force-pushed the feat/wan22 branch from 6383be3 to e54bb0a Compare April 30, 2026 02:33

wu6u3tw commented Apr 30, 2026

View reviewed changes

wu6u3tw force-pushed the feat/wan22 branch from e54bb0a to 2aed087 Compare April 30, 2026 02:43

arekay-nv reviewed Apr 30, 2026

View reviewed changes

arekay-nv requested a review from Copilot May 1, 2026 01:58

Copilot started reviewing on behalf of arekay-nv May 1, 2026 01:59 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

wu6u3tw added 5 commits May 1, 2026 16:09

remove path in offline_wan22_lyris.yaml

8061668

Remove name of the datacenter setup_and_test.sh

32d7943

Update and rename offline_wan22_lyris.yaml to offline_wan22.yaml

e345c5b

wu6u3tw added 5 commits May 1, 2026 16:09

wu6u3tw force-pushed the feat/wan22 branch from 4eb0fc1 to 763f65a Compare May 1, 2026 23:12

wu6u3tw and others added 4 commits May 5, 2026 15:28

wu6u3tw commented May 5, 2026

View reviewed changes

wu6u3tw requested a review from arekay-nv May 5, 2026 23:43

arekay-nv reviewed May 6, 2026

View reviewed changes

		DUMMY_VIDEO_BYTES = b"\x00\x00\x00\x20ftypmp42" + b"\x00" * 24


		class MockTrtllmServe:

		# exclude_none so optional fields fall back to server-side defaults
		# (MLPerf: omit negative_prompt and latent_path unless explicitly set).

-        # exclude_none so optional fields fall back to server-side defaults
-        # (MLPerf: omit negative_prompt and latent_path unless explicitly set).
+        # exclude_none so optional fields with value None fall back to
+        # server-side defaults. In particular, negative_prompt and
+        # latent_path are omitted only when their value in query.data is None.

		# Fixed latent path is injected into every request via VideoGenDataset.
		# Set latent_path in dataset config or pass it when constructing VideoGenDataset.

-# Fixed latent path is injected into every request via VideoGenDataset.
-# Set latent_path in dataset config or pass it when constructing VideoGenDataset.
+# This example loads prompts directly from JSONL and does not use
+# `VideoGenDataset`, so `latent_path` is not injected automatically here.
+# If required, include `latent_path` in the JSONL or add it via a transform.

-        return web.json_response(
-            {
-                "video_id": video_id,
-                "video_bytes": base64.b64encode(DUMMY_VIDEO_BYTES).decode(),
-            }
-        )
+        response_format = body.get("response_format")
+        response_body = {"video_id": video_id}
+        if response_format == "video_path":
+            response_body["video_path"] = DUMMY_VIDEO_PATH
+        else:
+            response_body["video_bytes"] = base64.b64encode(DUMMY_VIDEO_BYTES).decode()
+        return web.json_response(response_body)

	│ VideoPathResponse, VideoPayloadResponse, HealthResponse
	│ VideoPathResponse, VideoPayloadResponse

Conversation

wu6u3tw commented Apr 22, 2026

Summary

Test plan

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

github-actions Bot commented Apr 22, 2026 •

edited

Loading