Add Muon optimizer with 8-bit quantized momentum (bnb.optim.Muon8bit)

### Feature request

Add a Muon optimizer to bitsandbytes.optim whose momentum buffer is stored in bitsandbytes' quantized formats, primarily an 8-bit blockwise variant (Muon8bit), with the existing 32-bit and 4-bit (NF4 / FP4 / NVFP4) paths as siblings.

### Motivation

Major open models are now pretrained with Muon (e.g. Kimi, Laguna XS.2), and recent work shows the finetuning optimizer should be consistent with the pretraining optimizer ([Liu, Wang & Zhang, arXiv:2605.06654](https://arxiv.org/abs/2605.06654)) finds that "full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff." So as Muon-pretrained checkpoints proliferate, users increasingly need a Muon optimizer to finetune them.



### Your contribution

I already have implemented a muon version of 32-bit, 8bit and 4 bit version under ： https://github.com/theblackcat102/bitsandbytes/tree/feature/muon

From a 50-step smoke test over 96 configurations (Qwen/Qwen3-4B-Base, bf16):

- 8-bit Muon matches 32-bit Muon — the loss curves nearly overlap across all 50 steps, so 8-bit is a practical default rather than a quality trade-off.
- Memory: Muon 8-bit uses 13.63 GB allocated vs 23.73 GB for Muon 32-bit (−43%), and −26% peak. The saving comes entirely from quantizing the momentum — in bf16 training, 32-bit Muon is a memory wash vs AdamW (2×bf16 = 1×fp32 = 4 B/param), so the quantized buffer is what makes Muon worthwhile in bitsandbytes terms.
- NF4 / FP4 / NVFP4 are equivalent on convergence under stable LR; 8-bit captures most of the win (8→4 bit saves only a further ~1.5 GB, since the remaining floor is model weights, not optimizer state).

<img width="1934" height="772" alt="Image" src="https://github.com/user-attachments/assets/37218c78-9227-4391-93e6-2a8d9a64693e" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Muon optimizer with 8-bit quantized momentum (bnb.optim.Muon8bit) #1973

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add Muon optimizer with 8-bit quantized momentum (bnb.optim.Muon8bit) #1973

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions