feat(catalog): DeepSeek V4 + GLM-5 families; bump mlx-vlm + vllm#74
Open
cryptopoly wants to merge 3 commits into
Open
feat(catalog): DeepSeek V4 + GLM-5 families; bump mlx-vlm + vllm#74cryptopoly wants to merge 3 commits into
cryptopoly wants to merge 3 commits into
Conversation
…m + vllm Release upstream polish. Deps (loose floor bumps, no code change): - mlx-vlm 0.6.0 -> 0.6.3 - vllm 0.22.0 -> 0.22.1 ([vllm] + [triattention] extras) Discover catalog -- two frontier sparse-MoE families (text-only, verified HF repos + real on-disk sizes): - DeepSeek V4: Flash (284B / ~13B active, 1M ctx, baked-in MTP head) + Pro (1.6T). mlx-community 4-bit Flash (154 GB) is the local-viable entry; official BF16 + 8-bit + Pro listed for awareness. - GLM-5 / GLM-5.1: GlmMoeDsa MoE (256 experts / 8 active, ~200K ctx). unsloth GGUF (Q4_K_M ~515 GB) + mlx-community MXFP4 + zai-org BF16. Both text-only (configs carry no vision_config) so capabilities omit vision -- no broken composer affordance. Tests + gate: - tests/test_catalog_text_families.py: parse + required-field + text-only + discover-payload checks. - E2E phase 0 "new model families" check asserts both surface in the live /api/workspace catalog with their full variant set. Validated: phase 0 PASS, 11 checks. Tracked follow-ups (not in this change): MTPLX installer already auto-updates to v1.0.1 (re-test FU-079 empty-output vs its new /v1 streaming); dflash-mlx v0.1.9 migration stays deferred (FU-057); llama-cpp-turboquant branch drifted (FU-065 commit-pin needs a verified test-build).
- FU-065: turbo branch drifted 2cbfdc62 -> 73eb521d (reproducibility risk confirmed; pin still deferred pending a verified test-compile). - FU-079: MTPLX hit v1.0.0/v1.0.1 (installer auto-updates from 0.3.5); v1.0.0 added real /v1 token streaming -> re-test the empty-output against v1.0.1. - FU-067: dflash-mlx v0.1.9 now tagged; FU-057 migration stays deferred.
Gemma 4 (gemma-4 family): - E2B: 2B multimodal, 128K ctx — official QAT Q4_0 GGUF (~1.5 GB) + BF16 - 31B: 31B multimodal, 256K ctx — MLX 8-bit, unsloth Q4_K_M GGUF, official QAT GGUF, BF16 - Both carry vision capability (Gemma4ForConditionalGeneration + vision_config confirmed) MiniMax M2.7 (minimax-m2 family): - 256 routed experts / 8 active, 200K ctx, ~240B total params / ~480 GB BF16 - mlx-community MXFP4 (~120 GB), unsloth GGUF Q4_K_M (~130 GB), official BF16 Qwen3.7 skipped — no official Qwen/Qwen3.7-* repo exists on HF as of 2026-06-12. Tests: 7 catalog gate checks updated to cover all 4 frontier families (shape, vision vs text-only, context windows, discover payload presence).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release upstream polish — deps + frontier model families
Final pre-release pass over upstream repos + the Discover catalog.
Dependency bumps (loose floors, no code change)
mlx-vlm0.6.0 → 0.6.3vllm0.22.0 → 0.22.1 ([vllm]+[triattention]extras)All other tracked deps are current (turboquant-mlx-full 0.6.2, mlx 0.31.2, mlx-lm 0.31.3, diffusers 0.38.0, nunchaku 0.16.1, kvpress 0.5.3).
Discover catalog — two new frontier families
Verified HF repos + real on-disk sizes; both text-only (configs carry no
vision_config).DeepSeek V4 (
deepseek_v4MoE, 256 experts / 6 active, 1M ctx, baked-in MTP head):mlx-community/DeepSeek-V4-Flash-4bit(154 GB) — local-viable entry · 8-bit · official BF16 ·DeepSeek-V4-Pro(1.6T, awareness)GLM-5 / GLM-5.1 (
glm_moe_dsaMoE, 256 experts / 8 active, ~200K ctx):unsloth/GLM-5.1-GGUF(Q4_K_M ~515 GB) ·mlx-community/GLM-5.1-MXFP4-Q8·zai-org/GLM-5.1+GLM-5BF16These are frontier-scale (top-end workstation / cluster), listed for discovery awareness with honest sizes.
Tests + gate
tests/test_catalog_text_families.py— parse, required-field, text-only (no vision tag), and discover-payload checks.new model familiescheck asserts both surface in the live/api/workspacecatalog with their full variant sets. Validated: phase 0 PASS, 11 checks.Tracked follow-ups (intentionally not in this PR)
/v1/completionstoken streaming → re-test FU-079 empty-output.2cbfdc62→73eb521d) — FU-065 commit-pin deferred (needs a verified test-compile; pinning an untested commit risks a broken turbo build).🤖 Generated with Claude Code