Skip to content

feat(LlamaContext): expose ubatchSize separately from batchSize#606

Open
andreinknv wants to merge 1 commit into
withcatai:masterfrom
andreinknv:feat/ubatch-size-option
Open

feat(LlamaContext): expose ubatchSize separately from batchSize#606
andreinknv wants to merge 1 commit into
withcatai:masterfrom
andreinknv:feat/ubatch-size-option

Conversation

@andreinknv
Copy link
Copy Markdown

Summary

Adds a ubatchSize?: number option on LlamaContextOptions, plumbing through to llama_context_params.n_ubatch. When unset, the existing default (n_ubatch = n_batch) is preserved exactly.

Why

Today llama/addon/AddonContext.cpp always sets n_ubatch = n_batch inside the batchSize handler — the JS-side batch queue is the reason. That's a reasonable default, but it prevents callers from ever asking for a smaller physical micro-batch than the logical batch. The C++ value is the same one llama-server exposes as --ubatch-size, and decoupling them serves two real use cases:

  1. Per-ubatch VRAM headroom. On hardware that's sensitive to VRAM peaks during the forward pass (some Metal models, low-end CUDA), a smaller n_ubatch lets a larger total n_batch fit. Today this is unreachable from node-llama-cpp.
  2. Throughput characterization. Sweeping n_ubatch independently of n_batch is the canonical way to characterize a model+hardware combo for sustained-load deployments. Matches what llama-server --batch-size N --ubatch-size M already permits.

Plumbing

  • src/evaluator/LlamaContext/types.ts — new ubatchSize?: number field on LlamaContextOptions with docstring noting the ≤ batchSize constraint and the link to llama.cpp's --ubatch-size.
  • src/evaluator/LlamaContext/LlamaContext.ts — destructured from the options bag, forwarded into the AddonContext options.
  • llama/addon/AddonContext.cppif (options.Has("ubatchSize")) overrides context_params.n_ubatch, placed AFTER the batchSize handler so the explicit value wins over the default.

Compatibility

  • ubatchSize is optional. When unset, n_ubatch = n_batch exactly as today.
  • No public-API breakage.

Test plan

  • Local build + smoke test on Qwen2.5-Coder-3B-Instruct Q4_K_M (Metal, batchSize: 512, ubatchSize: 256) — context constructs cleanly; per-decode logs confirm n_ubatch=256 reaches llama.cpp.
  • Existing tests pass locally (no change with ubatchSize unset).
  • CI on this PR.

🤖 Generated with Claude Code

Currently the C++ binding always sets `n_ubatch = n_batch`, with the
comment that the batch queue is managed JS-side. That's true for the
default case, but it prevents callers from ever asking for a smaller
physical micro-batch than the logical batch — equivalent to llama.cpp's
`--ubatch-size` flag.

This PR adds a `ubatchSize?: number` option on `LlamaContextOptions`.
When set, it forwards to `llama_context_params.n_ubatch`, overriding
the `n_ubatch = n_batch` default in the binding. When unset, behavior
is unchanged.

Two real use cases:
  1. Hardware where the model is sensitive to per-ubatch VRAM peaks —
     a smaller ubatch lets a larger total batch fit.
  2. Throughput tuning probes — sweeping `n_ubatch` independently of
     `n_batch` is useful when characterizing a model+hardware combo
     for sustained-load deployments (matches what
     `llama-server --batch-size N --ubatch-size M` already permits).

Plumbing:
  - LlamaContextOptions.ubatchSize (types.ts) — public option with docstring.
  - LlamaContext constructor (LlamaContext.ts) — destructured and
    forwarded into the AddonContext options bag.
  - AddonContext.cpp — when `options.Has("ubatchSize")`, overrides
    `context_params.n_ubatch` (must come AFTER the `batchSize` handler
    so the explicit `ubatchSize` wins over the `n_ubatch = n_batch`
    default).

No default change. Existing callers see no behavior shift.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant