feat(LlamaContext): expose ubatchSize separately from batchSize by andreinknv · Pull Request #606 · withcatai/node-llama-cpp

andreinknv · 2026-05-14T22:48:27Z

Summary

Adds a ubatchSize?: number option on LlamaContextOptions, plumbing through to llama_context_params.n_ubatch. When unset, the existing default (n_ubatch = n_batch) is preserved exactly.

Why

Today llama/addon/AddonContext.cpp always sets n_ubatch = n_batch inside the batchSize handler — the JS-side batch queue is the reason. That's a reasonable default, but it prevents callers from ever asking for a smaller physical micro-batch than the logical batch. The C++ value is the same one llama-server exposes as --ubatch-size, and decoupling them serves two real use cases:

Per-ubatch VRAM headroom. On hardware that's sensitive to VRAM peaks during the forward pass (some Metal models, low-end CUDA), a smaller n_ubatch lets a larger total n_batch fit. Today this is unreachable from node-llama-cpp.
Throughput characterization. Sweeping n_ubatch independently of n_batch is the canonical way to characterize a model+hardware combo for sustained-load deployments. Matches what llama-server --batch-size N --ubatch-size M already permits.

Plumbing

src/evaluator/LlamaContext/types.ts — new ubatchSize?: number field on LlamaContextOptions with docstring noting the ≤ batchSize constraint and the link to llama.cpp's --ubatch-size.
src/evaluator/LlamaContext/LlamaContext.ts — destructured from the options bag, forwarded into the AddonContext options.
llama/addon/AddonContext.cpp — if (options.Has("ubatchSize")) overrides context_params.n_ubatch, placed AFTER the batchSize handler so the explicit value wins over the default.

Compatibility

ubatchSize is optional. When unset, n_ubatch = n_batch exactly as today.
No public-API breakage.

Test plan

Local build + smoke test on Qwen2.5-Coder-3B-Instruct Q4_K_M (Metal, batchSize: 512, ubatchSize: 256) — context constructs cleanly; per-decode logs confirm n_ubatch=256 reaches llama.cpp.
Existing tests pass locally (no change with ubatchSize unset).
CI on this PR.

🤖 Generated with Claude Code

Currently the C++ binding always sets `n_ubatch = n_batch`, with the comment that the batch queue is managed JS-side. That's true for the default case, but it prevents callers from ever asking for a smaller physical micro-batch than the logical batch — equivalent to llama.cpp's `--ubatch-size` flag. This PR adds a `ubatchSize?: number` option on `LlamaContextOptions`. When set, it forwards to `llama_context_params.n_ubatch`, overriding the `n_ubatch = n_batch` default in the binding. When unset, behavior is unchanged. Two real use cases: 1. Hardware where the model is sensitive to per-ubatch VRAM peaks — a smaller ubatch lets a larger total batch fit. 2. Throughput tuning probes — sweeping `n_ubatch` independently of `n_batch` is useful when characterizing a model+hardware combo for sustained-load deployments (matches what `llama-server --batch-size N --ubatch-size M` already permits). Plumbing: - LlamaContextOptions.ubatchSize (types.ts) — public option with docstring. - LlamaContext constructor (LlamaContext.ts) — destructured and forwarded into the AddonContext options bag. - AddonContext.cpp — when `options.Has("ubatchSize")`, overrides `context_params.n_ubatch` (must come AFTER the `batchSize` handler so the explicit `ubatchSize` wins over the `n_ubatch = n_batch` default). No default change. Existing callers see no behavior shift.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(LlamaContext): expose ubatchSize separately from batchSize#606

feat(LlamaContext): expose ubatchSize separately from batchSize#606
andreinknv wants to merge 1 commit into
withcatai:masterfrom
andreinknv:feat/ubatch-size-option

andreinknv commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andreinknv commented May 14, 2026

Summary

Why

Plumbing

Compatibility

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant