Skip to content

Add no-repeat-ngram repetition guard to the constrained decoder#504

Merged
FuJacob merged 1 commit into
mainfrom
feat/constrained-repetition-guard
Jun 1, 2026
Merged

Add no-repeat-ngram repetition guard to the constrained decoder#504
FuJacob merged 1 commit into
mainfrom
feat/constrained-repetition-guard

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented Jun 1, 2026

Summary

The constrained decoder (added in #503) selects each token by raw-logit argmax, which has no inherent resistance to repetition — a base model can loop on a phrase ("I think that I think that …") or a single token. The engine's repetition_penalty lives in its own sampler and never reaches this raw-logit path. This adds a pure no-repeat-ngram guard that blocks any token which would close a 3-gram already present in the output.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' \
  build-for-testing -derivedDataPath build/DerivedData
# ** TEST BUILD SUCCEEDED **

swiftlint lint --quiet
# 0 violations
  • New pure logic is unit-tested: RepetitionGuardTests (ngram-size floor, short history, repeated-prefix follower blocking, single-token runs, multiple followers, bigram order, non-pending prefixes) and two new ConstrainedSamplerTests cases for the block-list path.
  • Decode behavior on a live model is still validated by flipping the default-off flag on device (same as Add deterministic constrained decoder behind a default-off flag #503).

Linked issues

Refs #503.

Risk / rollout notes

  • Scope. RepetitionGuard is new and pure. ConstrainedSampler.selectToken gains a blockedTokenIDs parameter that defaults to empty, so every existing caller and test is unchanged. Only runConstrainedDecode passes a non-empty set.
  • Still default-off. The constrained path runs only when cotabbyConstrainedDecoderEnabled is set; the shipping sampler path is untouched.
  • Tuning. The n-gram order is 3 (the conventional choice): it breaks phrase loops and single-token runs after a few repeats without blocking ordinary short repeats like "very very".

Greptile Summary

This PR adds a pure no-repeat-ngram repetition guard to the constrained decoder path introduced in #503. Because that path selects tokens by raw-logit argmax and bypasses the engine's sampler (which carries its own repetition_penalty), base models can fall into phrase loops; the guard closes that gap by blocking any token that would close an n-gram already present in the output.

  • RepetitionGuard is a new stateless enum that, given a token-id history and an n-gram order, returns the set of token IDs that must be skipped this step. The O(n × prefixLength) scan is correct, bounds-safe, and covered by eight focused unit tests.
  • ConstrainedSampler.selectToken gains a blockedTokenIDs parameter defaulting to empty, so all existing callers are unaffected; the new block-list check is correctly ordered before the admissible-set check.
  • runConstrainedDecode now builds a blockedTokenIDs set each step and passes it to selectToken; generatedTokenIDs is tracked in parallel with generatedBytes for this purpose. The feature remains default-off behind cotabbyConstrainedDecoderEnabled.

Confidence Score: 4/5

Safe to merge — the repetition guard is pure and default-off; the only issues are a superfluous import and a log string that conflates two distinct stop conditions.

The algorithm is correct and bounds-safe. The generatedTokenIDs tracking is placed consistently alongside the existing generatedBytes append. The new blockedTokenIDs parameter defaults to empty, leaving all existing call sites untouched. The two minor findings are the unused Foundation import in RepetitionGuard.swift and the "no_admissible_token" stop reason being reused for a new stop condition, which could make future log-based diagnosis harder but does not affect runtime behavior.

No files require special attention; LlamaRuntimeCore.swift is the only place worth a second look if the no_admissible_token log ambiguity is ever actioned.

Important Files Changed

Filename Overview
Cotabby/Support/RepetitionGuard.swift New pure no-repeat-ngram guard; algorithm is correct, bounds-safe, and exhaustively unit-tested. Minor: import Foundation is unused.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift Wires RepetitionGuard into runConstrainedDecode; correct placement of generatedTokenIDs tracking. When the guard blocks all candidates, stopReason is logged as "no_admissible_token", which is already used for byte-prefix exhaustion — ambiguous in logs.
Cotabby/Support/ConstrainedSampler.swift Adds blockedTokenIDs parameter with a default of empty set, preserving all existing call sites. Check order (excluded → blocked → admissible) is correct and consistent.
CotabbyTests/RepetitionGuardTests.swift Covers all key cases: floor guard, short history, no-repeat prefix, trigram/bigram order, single-token runs, multiple followers, and non-pending prefixes.
CotabbyTests/ConstrainedSamplerTests.swift Two new tests cover the blocked-token skip path and the all-blocked nil return, matching the two new code branches.
Cotabby.xcodeproj/project.pbxproj Registers RepetitionGuard.swift in the app target and RepetitionGuardTests.swift in the test target; no structural issues.

Sequence Diagram

sequenceDiagram
    participant D as runConstrainedDecode
    participant E as LlamaEngine
    participant RG as RepetitionGuard
    participant CS as ConstrainedSampler

    loop each token step
        D->>E: getNextTokenLogits(sequenceID)
        E-->>D: logits[vocabSize]
        D->>RG: blockedTokens(history: generatedTokenIDs, ngramSize: 3)
        RG-->>D: "blockedTokenIDs (Set<Int>)"
        D->>CS: selectToken(logits, profile, admissibleTokenIDs: nil, topK, blockedTokenIDs)
        CS-->>D: tokenID? (nil stops decode)
        D->>D: append bytes + tokenID to history
        D->>E: acceptToken(sequenceID, tokenID)
        E-->>D: .ok / error
    end
Loading

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Add no-repeat-ngram repetition guard to ..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

The constrained decoder selects each token by raw-logit argmax, which has
no inherent resistance to repetition: a base model can loop on a phrase or
a single token, and the engine's repetition penalty lives in its sampler
and never reaches this raw-logit path. This adds a pure no-repeat-ngram
guard (RepetitionGuard) that blocks any token which would close a 3-gram
already present in the output, and threads a per-step block-list through
ConstrainedSampler.selectToken (default empty, so the existing callers and
tests are unaffected). The guard reasons over token ids, so it is
independent of detokenization and exhaustively unit-tested.

Only the constrained decode path is affected, and that path is still gated
by the default-off cotabbyConstrainedDecoderEnabled flag.
Comment on lines +1 to +3
import Foundation

/// File overview:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 import Foundation is not used in this file — all types (Array, Set, Int) are from the Swift standard library. Removing the import makes the file's dependencies clearer and avoids a small unnecessary overhead.

Suggested change
import Foundation
/// File overview:
/// File overview:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

Comment on lines 317 to 319
) else {
stopReason = "no_admissible_token"
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Ambiguous stopReason when repetition guard exhausts all candidates

"no_admissible_token" is already emitted when the byte-prefix constraint returns an empty admissible set; now the same string is logged when the repetition guard blocks every surviving candidate. A post-hoc log search won't distinguish between the two cases. Consider a distinct value such as "repetition_guard_exhausted" so decode diagnostics can tell apart a structural constraint failure from a repetition block.

Fix in Codex Fix in Claude Code

@FuJacob FuJacob merged commit f435700 into main Jun 1, 2026
4 checks passed
@FuJacob FuJacob deleted the feat/constrained-repetition-guard branch June 1, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant