Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
-
Updated
Feb 20, 2026 - Python
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Lucebox: LLM inference server built for speed for specific consumer hardware.
Large-scale LLM inference engine
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
2.24x decode TPS increase On Qwen 3.6 27B @ temp 0.6 | Native MTP Speculative Decoding On Apple Silicon With No External Drafter.
scalable and robust tree-based speculative decoding algorithm
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
DFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAM
Pure Rust Inference Engine
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving
REST: Retrieval-Based Speculative Decoding, NAACL 2024
llama.cpp fork with TurboQuant WHT-rotated KV cache & weight compression + Gemma 4 MTP and Qwen 3.6 NextN speculative decoding (+30-50% throughput).
Tree-based speculative decoding for Apple Silicon (MLX). ~10-15% faster than DFlash on code, ~1.5x over autoregressive. First MLX port with custom Metal kernels for hybrid model support.
LLM Inference on consumer devices
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation of large language models
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Add a description, image, and links to the speculative-decoding topic page so that developers can more easily learn about it.
To associate your repository with the speculative-decoding topic, visit your repo's landing page and select "manage topics."