Skip to content
#

continuous-batching

Here are 32 public repositories matching this topic...

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

  • Updated Jun 29, 2026
  • Python

High-performance discrete-event simulator (C++20/Python) for modeling agentic LLM traffic, KV cache dynamics, Prefill-Decode Disaggregation (PDD), and scheduling policies. Features roofline model analysis, K-Means request clustering, and a real-time web dashboard.

  • Updated Jun 29, 2026
  • C++

Improve this page

Add a description, image, and links to the continuous-batching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the continuous-batching topic, visit your repo's landing page and select "manage topics."

Learn more