Add no-repeat-ngram repetition guard to the constrained decoder by FuJacob · Pull Request #504 · FuJacob/cotabby

FuJacob · 2026-06-01T15:18:26Z

Summary

The constrained decoder (added in #503) selects each token by raw-logit argmax, which has no inherent resistance to repetition — a base model can loop on a phrase ("I think that I think that …") or a single token. The engine's repetition_penalty lives in its own sampler and never reaches this raw-logit path. This adds a pure no-repeat-ngram guard that blocks any token which would close a 3-gram already present in the output.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' \
  build-for-testing -derivedDataPath build/DerivedData
# ** TEST BUILD SUCCEEDED **

swiftlint lint --quiet
# 0 violations

New pure logic is unit-tested: RepetitionGuardTests (ngram-size floor, short history, repeated-prefix follower blocking, single-token runs, multiple followers, bigram order, non-pending prefixes) and two new ConstrainedSamplerTests cases for the block-list path.
Decode behavior on a live model is still validated by flipping the default-off flag on device (same as Add deterministic constrained decoder behind a default-off flag #503).

Linked issues

Refs #503.

Risk / rollout notes

Scope. RepetitionGuard is new and pure. ConstrainedSampler.selectToken gains a blockedTokenIDs parameter that defaults to empty, so every existing caller and test is unchanged. Only runConstrainedDecode passes a non-empty set.
Still default-off. The constrained path runs only when cotabbyConstrainedDecoderEnabled is set; the shipping sampler path is untouched.
Tuning. The n-gram order is 3 (the conventional choice): it breaks phrase loops and single-token runs after a few repeats without blocking ordinary short repeats like "very very".

Greptile Summary

This PR adds a pure no-repeat-ngram repetition guard to the constrained decoder path introduced in #503. Because that path selects tokens by raw-logit argmax and bypasses the engine's sampler (which carries its own repetition_penalty), base models can fall into phrase loops; the guard closes that gap by blocking any token that would close an n-gram already present in the output.

RepetitionGuard is a new stateless enum that, given a token-id history and an n-gram order, returns the set of token IDs that must be skipped this step. The O(n × prefixLength) scan is correct, bounds-safe, and covered by eight focused unit tests.
ConstrainedSampler.selectToken gains a blockedTokenIDs parameter defaulting to empty, so all existing callers are unaffected; the new block-list check is correctly ordered before the admissible-set check.
runConstrainedDecode now builds a blockedTokenIDs set each step and passes it to selectToken; generatedTokenIDs is tracked in parallel with generatedBytes for this purpose. The feature remains default-off behind cotabbyConstrainedDecoderEnabled.

Confidence Score: 4/5

Safe to merge — the repetition guard is pure and default-off; the only issues are a superfluous import and a log string that conflates two distinct stop conditions.

The algorithm is correct and bounds-safe. The generatedTokenIDs tracking is placed consistently alongside the existing generatedBytes append. The new blockedTokenIDs parameter defaults to empty, leaving all existing call sites untouched. The two minor findings are the unused Foundation import in RepetitionGuard.swift and the "no_admissible_token" stop reason being reused for a new stop condition, which could make future log-based diagnosis harder but does not affect runtime behavior.

No files require special attention; LlamaRuntimeCore.swift is the only place worth a second look if the no_admissible_token log ambiguity is ever actioned.

Important Files Changed

Filename	Overview
Cotabby/Support/RepetitionGuard.swift	New pure no-repeat-ngram guard; algorithm is correct, bounds-safe, and exhaustively unit-tested. Minor: `import Foundation` is unused.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift	Wires RepetitionGuard into runConstrainedDecode; correct placement of generatedTokenIDs tracking. When the guard blocks all candidates, stopReason is logged as "no_admissible_token", which is already used for byte-prefix exhaustion — ambiguous in logs.
Cotabby/Support/ConstrainedSampler.swift	Adds `blockedTokenIDs` parameter with a default of empty set, preserving all existing call sites. Check order (excluded → blocked → admissible) is correct and consistent.
CotabbyTests/RepetitionGuardTests.swift	Covers all key cases: floor guard, short history, no-repeat prefix, trigram/bigram order, single-token runs, multiple followers, and non-pending prefixes.
CotabbyTests/ConstrainedSamplerTests.swift	Two new tests cover the blocked-token skip path and the all-blocked nil return, matching the two new code branches.
Cotabby.xcodeproj/project.pbxproj	Registers RepetitionGuard.swift in the app target and RepetitionGuardTests.swift in the test target; no structural issues.

Sequence Diagram

sequenceDiagram
    participant D as runConstrainedDecode
    participant E as LlamaEngine
    participant RG as RepetitionGuard
    participant CS as ConstrainedSampler

    loop each token step
        D->>E: getNextTokenLogits(sequenceID)
        E-->>D: logits[vocabSize]
        D->>RG: blockedTokens(history: generatedTokenIDs, ngramSize: 3)
        RG-->>D: "blockedTokenIDs (Set<Int>)"
        D->>CS: selectToken(logits, profile, admissibleTokenIDs: nil, topK, blockedTokenIDs)
        CS-->>D: tokenID? (nil stops decode)
        D->>D: append bytes + tokenID to history
        D->>E: acceptToken(sequenceID, tokenID)
        E-->>D: .ok / error
    end

_{Reviews (1): Last reviewed commit: "Add no-repeat-ngram repetition guard to ..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

The constrained decoder selects each token by raw-logit argmax, which has no inherent resistance to repetition: a base model can loop on a phrase or a single token, and the engine's repetition penalty lives in its sampler and never reaches this raw-logit path. This adds a pure no-repeat-ngram guard (RepetitionGuard) that blocks any token which would close a 3-gram already present in the output, and threads a per-step block-list through ConstrainedSampler.selectToken (default empty, so the existing callers and tests are unaffected). The guard reasons over token ids, so it is independent of detokenization and exhaustively unit-tested. Only the constrained decode path is affected, and that path is still gated by the default-off cotabbyConstrainedDecoderEnabled flag.

greptile-apps · 2026-06-01T15:22:20Z

+import Foundation
+
+/// File overview:


import Foundation is not used in this file — all types (Array, Set, Int) are from the Swift standard library. Removing the import makes the file's dependencies clearer and avoids a small unnecessary overhead.

Suggested change

import Foundation

/// File overview:

/// File overview:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-01T15:22:21Z

            ) else {
                stopReason = "no_admissible_token"
                break


Ambiguous stopReason when repetition guard exhausts all candidates

"no_admissible_token" is already emitted when the byte-prefix constraint returns an empty admissible set; now the same string is logged when the repetition guard blocks every surviving candidate. A post-hoc log search won't distinguish between the two cases. Consider a distinct value such as "repetition_guard_exhausted" so decode diagnostics can tell apart a structural constraint failure from a repetition block.

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

FuJacob merged commit f435700 into main Jun 1, 2026
4 checks passed

FuJacob deleted the feat/constrained-repetition-guard branch June 1, 2026 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add no-repeat-ngram repetition guard to the constrained decoder#504

Add no-repeat-ngram repetition guard to the constrained decoder#504
FuJacob merged 1 commit into
mainfrom
feat/constrained-repetition-guard

FuJacob commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot Jun 1, 2026

Uh oh!

greptile-apps Bot Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented Jun 1, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading