feat(catalog): DeepSeek V4 + GLM-5 families; bump mlx-vlm + vllm by cryptopoly · Pull Request #74 · cryptopoly/ChaosEngineAI

cryptopoly · 2026-06-11T08:34:27Z

Release upstream polish — deps + frontier model families

Final pre-release pass over upstream repos + the Discover catalog.

Dependency bumps (loose floors, no code change)

mlx-vlm 0.6.0 → 0.6.3
vllm 0.22.0 → 0.22.1 ([vllm] + [triattention] extras)

All other tracked deps are current (turboquant-mlx-full 0.6.2, mlx 0.31.2, mlx-lm 0.31.3, diffusers 0.38.0, nunchaku 0.16.1, kvpress 0.5.3).

Discover catalog — two new frontier families

Verified HF repos + real on-disk sizes; both text-only (configs carry no vision_config).

DeepSeek V4 (deepseek_v4 MoE, 256 experts / 6 active, 1M ctx, baked-in MTP head):

mlx-community/DeepSeek-V4-Flash-4bit (154 GB) — local-viable entry · 8-bit · official BF16 · DeepSeek-V4-Pro (1.6T, awareness)

GLM-5 / GLM-5.1 (glm_moe_dsa MoE, 256 experts / 8 active, ~200K ctx):

unsloth/GLM-5.1-GGUF (Q4_K_M ~515 GB) · mlx-community/GLM-5.1-MXFP4-Q8 · zai-org/GLM-5.1 + GLM-5 BF16

These are frontier-scale (top-end workstation / cluster), listed for discovery awareness with honest sizes.

Tests + gate

tests/test_catalog_text_families.py — parse, required-field, text-only (no vision tag), and discover-payload checks.
E2E phase 0 new model families check asserts both surface in the live /api/workspace catalog with their full variant sets. Validated: phase 0 PASS, 11 checks.

Tracked follow-ups (intentionally not in this PR)

MTPLX: installer is unpinned → already auto-updates to v1.0.1 (was 0.3.5). Still HTTP-server-only (FU-048/079 root persists), but v1.0.0 added real /v1/completions token streaming → re-test FU-079 empty-output.
dflash-mlx v0.1.9 tagged — FU-057 multi-hour API-rewrite migration stays deferred.
llama-cpp-turboquant branch drifted (2cbfdc62→73eb521d) — FU-065 commit-pin deferred (needs a verified test-compile; pinning an untested commit risks a broken turbo build).

🤖 Generated with Claude Code

…m + vllm Release upstream polish. Deps (loose floor bumps, no code change): - mlx-vlm 0.6.0 -> 0.6.3 - vllm 0.22.0 -> 0.22.1 ([vllm] + [triattention] extras) Discover catalog -- two frontier sparse-MoE families (text-only, verified HF repos + real on-disk sizes): - DeepSeek V4: Flash (284B / ~13B active, 1M ctx, baked-in MTP head) + Pro (1.6T). mlx-community 4-bit Flash (154 GB) is the local-viable entry; official BF16 + 8-bit + Pro listed for awareness. - GLM-5 / GLM-5.1: GlmMoeDsa MoE (256 experts / 8 active, ~200K ctx). unsloth GGUF (Q4_K_M ~515 GB) + mlx-community MXFP4 + zai-org BF16. Both text-only (configs carry no vision_config) so capabilities omit vision -- no broken composer affordance. Tests + gate: - tests/test_catalog_text_families.py: parse + required-field + text-only + discover-payload checks. - E2E phase 0 "new model families" check asserts both surface in the live /api/workspace catalog with their full variant set. Validated: phase 0 PASS, 11 checks. Tracked follow-ups (not in this change): MTPLX installer already auto-updates to v1.0.1 (re-test FU-079 empty-output vs its new /v1 streaming); dflash-mlx v0.1.9 migration stays deferred (FU-057); llama-cpp-turboquant branch drifted (FU-065 commit-pin needs a verified test-build).

- FU-065: turbo branch drifted 2cbfdc62 -> 73eb521d (reproducibility risk confirmed; pin still deferred pending a verified test-compile). - FU-079: MTPLX hit v1.0.0/v1.0.1 (installer auto-updates from 0.3.5); v1.0.0 added real /v1 token streaming -> re-test the empty-output against v1.0.1. - FU-067: dflash-mlx v0.1.9 now tagged; FU-057 migration stays deferred.

Gemma 4 (gemma-4 family): - E2B: 2B multimodal, 128K ctx — official QAT Q4_0 GGUF (~1.5 GB) + BF16 - 31B: 31B multimodal, 256K ctx — MLX 8-bit, unsloth Q4_K_M GGUF, official QAT GGUF, BF16 - Both carry vision capability (Gemma4ForConditionalGeneration + vision_config confirmed) MiniMax M2.7 (minimax-m2 family): - 256 routed experts / 8 active, 200K ctx, ~240B total params / ~480 GB BF16 - mlx-community MXFP4 (~120 GB), unsloth GGUF Q4_K_M (~130 GB), official BF16 Qwen3.7 skipped — no official Qwen/Qwen3.7-* repo exists on HF as of 2026-06-12. Tests: 7 catalog gate checks updated to cover all 4 frontier families (shape, vision vs text-only, context windows, discover payload presence).

cryptopoly added 3 commits June 11, 2026 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(catalog): DeepSeek V4 + GLM-5 families; bump mlx-vlm + vllm#74

feat(catalog): DeepSeek V4 + GLM-5 families; bump mlx-vlm + vllm#74
cryptopoly wants to merge 3 commits into
stagingfrom
release/upstream-model-families

cryptopoly commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cryptopoly commented Jun 11, 2026

Release upstream polish — deps + frontier model families

Dependency bumps (loose floors, no code change)

Discover catalog — two new frontier families

Tests + gate

Tracked follow-ups (intentionally not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant