Skip to content

Surface per-line OCR confidence so the low-confidence filter runs#506

Merged
FuJacob merged 1 commit into
mainfrom
feat/ocr-line-confidence
Jun 1, 2026
Merged

Surface per-line OCR confidence so the low-confidence filter runs#506
FuJacob merged 1 commit into
mainfrom
feat/ocr-line-confidence

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented Jun 1, 2026

Summary

ScreenTextExtractor discarded Vision's per-observation confidence, so OCRTextHygiene.dropLowConfidence — the cheapest, highest-signal hygiene filter — only ever saw a synthesized 1.0 and dropped nothing. This carries each recognized line's real confidence through a new ExtractedScreenText.lines field and feeds those lines to OCRTextHygiene.clean, so the recognizer's weakest guesses are dropped before they reach the completion prompt.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' \
  build-for-testing -derivedDataPath build/DerivedData
# ** TEST BUILD SUCCEEDED **

swiftlint lint --quiet
# 0 violations
  • New test test_generateContext_dropsLowConfidenceOCRLines proves the wiring end-to-end: a clean, plausible sentence at 0.2 confidence is dropped while a 0.95-confidence line survives, even though no other hygiene filter would catch the former.
  • The existing generator tests are updated to populate lines (at a confidence above the threshold), so they keep exercising the symbol/digit/sanitizer filters unchanged.

Linked issues

None.

Risk / rollout notes

  • Behavior change (visual context only). Real OCR confidence now gates lines via the existing dropLowConfidence threshold (0.4). Lines Vision is unsure about no longer enter the prompt; this only affects the screen-context excerpt, not completion generation.
  • ExtractedScreenText.lines defaults to empty for any caller that supplies only joined text, so the type stays backward compatible.
  • OCRTextHygiene.OCRLine gains Sendable (pure value type) to cross the OCR continuation boundary inside the Sendable ExtractedScreenText.

Greptile Summary

This PR threads Vision's per-observation confidence through ExtractedScreenText.lines so OCRTextHygiene.dropLowConfidence — previously neutralized by a synthesized 1.0 — now filters on real recognizer scores before lines reach the completion prompt.

  • ExtractedScreenText gains a lines: [OCRTextHygiene.OCRLine] field (default-empty, backward compatible), and ScreenTextExtractor populates it with each top candidate's actual confidence value.
  • ScreenshotContextGenerator replaces the old re-split-with-fake-confidence pattern with a direct pass of extracted.lines to OCRTextHygiene.clean.
  • OCRLine picks up Sendable conformance, and a new end-to-end test confirms the wiring: a 0.2-confidence line is dropped while a 0.95-confidence line survives.

Confidence Score: 4/5

Safe to merge — the core wiring is correct and the new test provides solid end-to-end coverage of the confidence gate.

The only finding is a one-sentence doc-comment in OCRTextHygiene.swift that still says the extractor currently discards confidence, directly contradicting what this PR accomplishes. It does not affect runtime behavior.

Cotabby/Support/OCRTextHygiene.swift — the OCRLine doc-comment contains a now-false claim about confidence being discarded.

Important Files Changed

Filename Overview
Cotabby/Services/Visual/ScreenTextExtractor.swift Adds lines: [OCRTextHygiene.OCRLine] to ExtractedScreenText (default-empty for backward compatibility) and rewires the Vision callback to capture candidate.confidence per observation instead of discarding it.
Cotabby/Services/Visual/ScreenshotContextGenerator.swift Switches from extracting only .text (then re-splitting with synthetic confidence 1.0) to storing the full ExtractedScreenText and passing extracted.lines directly to OCRTextHygiene.clean, enabling real confidence gating.
Cotabby/Support/OCRTextHygiene.swift Adds Sendable conformance to OCRLine (correct). The doc-comment on OCRLine.confidence contains a stale sentence claiming the extractor currently discards confidence, which this PR directly contradicts.
CotabbyTests/ScreenshotContextGeneratorTests.swift Existing helpers updated to supply OCRLine objects at confidence 0.9 to preserve prior test coverage. New test proves end-to-end wiring: 0.2-confidence line is dropped, 0.95-confidence line survives.

Sequence Diagram

sequenceDiagram
    participant SCG as ScreenshotContextGenerator
    participant STE as ScreenTextExtractor
    participant Vision as Vision OCR
    participant Hyg as OCRTextHygiene.clean

    SCG->>STE: extractText(from: image)
    STE->>Vision: VNRecognizeTextRequest
    Vision-->>STE: [VNRecognizedTextObservation]
    Note over STE: Build OCRLine(text, confidence)<br/>for each top candidate
    STE-->>SCG: ExtractedScreenText(text, lineCount, lines)
    Note over SCG: lines carry real Vision confidence<br/>(was: re-split text with confidence=1.0)
    SCG->>Hyg: clean(lines: extracted.lines, maxChars:)
    Note over Hyg: 1. dropLowConfidence (threshold 0.4) now effective<br/>2. dropReplacementCharacter<br/>3. dropHighSymbolDensity<br/>4. dropDigitSubstitution<br/>5. dropLowWordCharacterRatio<br/>6. strip fieldText echoes
    Hyg-->>SCG: cleaned OCR string
    SCG-->>SCG: return VisualContextExcerpt
Loading

Fix All in Codex Fix All in Claude Code

Reviews (1): Last reviewed commit: "Surface per-line OCR confidence so the l..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

ScreenTextExtractor discarded Vision's per-observation confidence, so the
OCRTextHygiene.dropLowConfidence filter only ever saw a synthesized 1.0
and never dropped anything. Carry each recognized line's confidence
through a new ExtractedScreenText.lines field and feed those lines to
OCRTextHygiene.clean, so the recognizer's weakest guesses are dropped
before they reach the completion prompt. The joined text field is kept
for logging and the window-title fallback.

OCRLine gains Sendable so it can ride inside the Sendable
ExtractedScreenText across the OCR continuation boundary.
Comment on lines 21 to +24
/// Confidence is carried alongside the text because the cheapest, highest-signal filter
/// (`dropLowConfidence`) needs it. The orchestrating extractor currently discards Vision's
/// per-candidate confidence; surfacing it into this value type is what lets filter #1 run.
struct OCRLine: Equatable {
struct OCRLine: Equatable, Sendable {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The doc-comment on OCRLine.confidence says "The orchestrating extractor currently discards Vision's per-candidate confidence" — but that statement is precisely what this PR fixes. Leaving it in place makes the comment actively misleading for anyone reading the type definition after the change lands.

Suggested change
/// Confidence is carried alongside the text because the cheapest, highest-signal filter
/// (`dropLowConfidence`) needs it. The orchestrating extractor currently discards Vision's
/// per-candidate confidence; surfacing it into this value type is what lets filter #1 run.
struct OCRLine: Equatable {
struct OCRLine: Equatable, Sendable {
/// Confidence is carried alongside the text because the cheapest, highest-signal filter
/// (`dropLowConfidence`) needs it. `ScreenTextExtractor` surfaces Vision's per-candidate
/// confidence here so filter #1 operates on real recognizer scores instead of a constant.
struct OCRLine: Equatable, Sendable {

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

@FuJacob FuJacob merged commit 6374271 into main Jun 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant