Skip to content

feat(catalog): DeepSeek V4 + GLM-5 families; bump mlx-vlm + vllm#74

Open
cryptopoly wants to merge 3 commits into
stagingfrom
release/upstream-model-families
Open

feat(catalog): DeepSeek V4 + GLM-5 families; bump mlx-vlm + vllm#74
cryptopoly wants to merge 3 commits into
stagingfrom
release/upstream-model-families

Conversation

@cryptopoly

Copy link
Copy Markdown
Owner

Release upstream polish — deps + frontier model families

Final pre-release pass over upstream repos + the Discover catalog.

Dependency bumps (loose floors, no code change)

  • mlx-vlm 0.6.0 → 0.6.3
  • vllm 0.22.0 → 0.22.1 ([vllm] + [triattention] extras)

All other tracked deps are current (turboquant-mlx-full 0.6.2, mlx 0.31.2, mlx-lm 0.31.3, diffusers 0.38.0, nunchaku 0.16.1, kvpress 0.5.3).

Discover catalog — two new frontier families

Verified HF repos + real on-disk sizes; both text-only (configs carry no vision_config).

DeepSeek V4 (deepseek_v4 MoE, 256 experts / 6 active, 1M ctx, baked-in MTP head):

  • mlx-community/DeepSeek-V4-Flash-4bit (154 GB) — local-viable entry · 8-bit · official BF16 · DeepSeek-V4-Pro (1.6T, awareness)

GLM-5 / GLM-5.1 (glm_moe_dsa MoE, 256 experts / 8 active, ~200K ctx):

  • unsloth/GLM-5.1-GGUF (Q4_K_M ~515 GB) · mlx-community/GLM-5.1-MXFP4-Q8 · zai-org/GLM-5.1 + GLM-5 BF16

These are frontier-scale (top-end workstation / cluster), listed for discovery awareness with honest sizes.

Tests + gate

  • tests/test_catalog_text_families.py — parse, required-field, text-only (no vision tag), and discover-payload checks.
  • E2E phase 0 new model families check asserts both surface in the live /api/workspace catalog with their full variant sets. Validated: phase 0 PASS, 11 checks.

Tracked follow-ups (intentionally not in this PR)

  • MTPLX: installer is unpinned → already auto-updates to v1.0.1 (was 0.3.5). Still HTTP-server-only (FU-048/079 root persists), but v1.0.0 added real /v1/completions token streaming → re-test FU-079 empty-output.
  • dflash-mlx v0.1.9 tagged — FU-057 multi-hour API-rewrite migration stays deferred.
  • llama-cpp-turboquant branch drifted (2cbfdc6273eb521d) — FU-065 commit-pin deferred (needs a verified test-compile; pinning an untested commit risks a broken turbo build).

🤖 Generated with Claude Code

…m + vllm

Release upstream polish.

Deps (loose floor bumps, no code change):
- mlx-vlm 0.6.0 -> 0.6.3
- vllm 0.22.0 -> 0.22.1 ([vllm] + [triattention] extras)

Discover catalog -- two frontier sparse-MoE families (text-only, verified
HF repos + real on-disk sizes):
- DeepSeek V4: Flash (284B / ~13B active, 1M ctx, baked-in MTP head) + Pro
  (1.6T). mlx-community 4-bit Flash (154 GB) is the local-viable entry;
  official BF16 + 8-bit + Pro listed for awareness.
- GLM-5 / GLM-5.1: GlmMoeDsa MoE (256 experts / 8 active, ~200K ctx).
  unsloth GGUF (Q4_K_M ~515 GB) + mlx-community MXFP4 + zai-org BF16.
Both text-only (configs carry no vision_config) so capabilities omit vision
-- no broken composer affordance.

Tests + gate:
- tests/test_catalog_text_families.py: parse + required-field + text-only +
  discover-payload checks.
- E2E phase 0 "new model families" check asserts both surface in the live
  /api/workspace catalog with their full variant set. Validated: phase 0
  PASS, 11 checks.

Tracked follow-ups (not in this change): MTPLX installer already auto-updates
to v1.0.1 (re-test FU-079 empty-output vs its new /v1 streaming); dflash-mlx
v0.1.9 migration stays deferred (FU-057); llama-cpp-turboquant branch drifted
(FU-065 commit-pin needs a verified test-build).
- FU-065: turbo branch drifted 2cbfdc62 -> 73eb521d (reproducibility risk
  confirmed; pin still deferred pending a verified test-compile).
- FU-079: MTPLX hit v1.0.0/v1.0.1 (installer auto-updates from 0.3.5); v1.0.0
  added real /v1 token streaming -> re-test the empty-output against v1.0.1.
- FU-067: dflash-mlx v0.1.9 now tagged; FU-057 migration stays deferred.
Gemma 4 (gemma-4 family):
- E2B: 2B multimodal, 128K ctx — official QAT Q4_0 GGUF (~1.5 GB) + BF16
- 31B: 31B multimodal, 256K ctx — MLX 8-bit, unsloth Q4_K_M GGUF, official QAT GGUF, BF16
- Both carry vision capability (Gemma4ForConditionalGeneration + vision_config confirmed)

MiniMax M2.7 (minimax-m2 family):
- 256 routed experts / 8 active, 200K ctx, ~240B total params / ~480 GB BF16
- mlx-community MXFP4 (~120 GB), unsloth GGUF Q4_K_M (~130 GB), official BF16

Qwen3.7 skipped — no official Qwen/Qwen3.7-* repo exists on HF as of 2026-06-12.

Tests: 7 catalog gate checks updated to cover all 4 frontier families
(shape, vision vs text-only, context windows, discover payload presence).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant