guide/token-prediction #625
Replies: 1 comment 1 reply
-
|
Hello there I'm trying to implement Multi-Token Prediction (MTP) with gemma 4 from your repo: Models Used:
my code : // Load main model
try {
modelStart = await appState.llmEngine.loadModel(modelOptions)
mainModelLoaded = true
appState.setModelStart(modelStart)
appState.setModelPath(data.path)
} catch (mainModelError) {
console.error('[Backend] Failed to load main model:', mainModelError)
throw new Error(`Failed to load main model: ${mainModelError.message}`)
}
// Load draf model
try {
draftModelStart = await appState.llmEngine.loadModel({
modelPath: "C:/Users/YamatoLab/Downloads/gemma-4-E2B-it.mtp.Q8_0.gguf"
})
draftModelLoaded = true
appState.setDraftModelStart(draftModelStart)
} catch (embeddingError) {
console.error('[Backend] Failed to load embedding model:', embeddingError)
throw new Error(`Failed to load embedding model: ${embeddingError.message}`)
}
// Build LlamaContextOptions untuk setiap node
const LlamaContextOptions = {
sequences: agentNodes.length, // total sessions needed
}
if (inferenceConfig.context_length !== undefined && inferenceConfig.context_length >= 512) {
LlamaContextOptions.contextSize = {
min: 512,
max: inferenceConfig.context_length
}
}
if (inferenceConfig.batch_size && inferenceConfig.batch_size > 0) {
LlamaContextOptions.batchSize = inferenceConfig.batch_size
}
if (inferenceConfig.cpu_threads !== undefined && inferenceConfig.cpu_threads >= 0) {
LlamaContextOptions.threads = inferenceConfig.cpu_threads
}
const context = await appState.modelStart.createContext(LlamaContextOptions)
// Create draf context
const draftContext = await appState.draftModelStart.createContext({
contextSize: {
max: 2048
}
});
const draftContextSequence = draftContext.getSequence();
// Create session
const sessionChat = new LlamaChatSession({
contextSequence: context.getSequence({
tokenPredictor: new DraftSequenceTokenPredictor(draftContextSequence, {
minTokens: 0,
minConfidence: 0.6
})
}),
autoDisposeSequence: true,
chatWrapper: agentWrapper,
systemPrompt
})I got log like this: do you have any solution for this issue? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
guide/token-prediction
Using token predictors to speed up the generation process in node-llama-cpp
https://node-llama-cpp.withcat.ai/guide/token-prediction
Beta Was this translation helpful? Give feedback.
All reactions