feat(LlamaContext): expose ubatchSize separately from batchSize#606
Open
andreinknv wants to merge 1 commit into
Open
feat(LlamaContext): expose ubatchSize separately from batchSize#606andreinknv wants to merge 1 commit into
andreinknv wants to merge 1 commit into
Conversation
Currently the C++ binding always sets `n_ubatch = n_batch`, with the
comment that the batch queue is managed JS-side. That's true for the
default case, but it prevents callers from ever asking for a smaller
physical micro-batch than the logical batch — equivalent to llama.cpp's
`--ubatch-size` flag.
This PR adds a `ubatchSize?: number` option on `LlamaContextOptions`.
When set, it forwards to `llama_context_params.n_ubatch`, overriding
the `n_ubatch = n_batch` default in the binding. When unset, behavior
is unchanged.
Two real use cases:
1. Hardware where the model is sensitive to per-ubatch VRAM peaks —
a smaller ubatch lets a larger total batch fit.
2. Throughput tuning probes — sweeping `n_ubatch` independently of
`n_batch` is useful when characterizing a model+hardware combo
for sustained-load deployments (matches what
`llama-server --batch-size N --ubatch-size M` already permits).
Plumbing:
- LlamaContextOptions.ubatchSize (types.ts) — public option with docstring.
- LlamaContext constructor (LlamaContext.ts) — destructured and
forwarded into the AddonContext options bag.
- AddonContext.cpp — when `options.Has("ubatchSize")`, overrides
`context_params.n_ubatch` (must come AFTER the `batchSize` handler
so the explicit `ubatchSize` wins over the `n_ubatch = n_batch`
default).
No default change. Existing callers see no behavior shift.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
ubatchSize?: numberoption onLlamaContextOptions, plumbing through tollama_context_params.n_ubatch. When unset, the existing default (n_ubatch = n_batch) is preserved exactly.Why
Today
llama/addon/AddonContext.cppalways setsn_ubatch = n_batchinside thebatchSizehandler — the JS-side batch queue is the reason. That's a reasonable default, but it prevents callers from ever asking for a smaller physical micro-batch than the logical batch. The C++ value is the same onellama-serverexposes as--ubatch-size, and decoupling them serves two real use cases:n_ubatchlets a larger totaln_batchfit. Today this is unreachable fromnode-llama-cpp.n_ubatchindependently ofn_batchis the canonical way to characterize a model+hardware combo for sustained-load deployments. Matches whatllama-server --batch-size N --ubatch-size Malready permits.Plumbing
src/evaluator/LlamaContext/types.ts— newubatchSize?: numberfield onLlamaContextOptionswith docstring noting the≤ batchSizeconstraint and the link to llama.cpp's--ubatch-size.src/evaluator/LlamaContext/LlamaContext.ts— destructured from the options bag, forwarded into theAddonContextoptions.llama/addon/AddonContext.cpp—if (options.Has("ubatchSize"))overridescontext_params.n_ubatch, placed AFTER thebatchSizehandler so the explicit value wins over the default.Compatibility
ubatchSizeis optional. When unset,n_ubatch = n_batchexactly as today.Test plan
batchSize: 512,ubatchSize: 256) — context constructs cleanly; per-decode logs confirmn_ubatch=256reaches llama.cpp.ubatchSizeunset).🤖 Generated with Claude Code