Surface per-line OCR confidence so the low-confidence filter runs#506
Merged
Conversation
ScreenTextExtractor discarded Vision's per-observation confidence, so the OCRTextHygiene.dropLowConfidence filter only ever saw a synthesized 1.0 and never dropped anything. Carry each recognized line's confidence through a new ExtractedScreenText.lines field and feed those lines to OCRTextHygiene.clean, so the recognizer's weakest guesses are dropped before they reach the completion prompt. The joined text field is kept for logging and the window-title fallback. OCRLine gains Sendable so it can ride inside the Sendable ExtractedScreenText across the OCR continuation boundary.
Comment on lines
21
to
+24
| /// Confidence is carried alongside the text because the cheapest, highest-signal filter | ||
| /// (`dropLowConfidence`) needs it. The orchestrating extractor currently discards Vision's | ||
| /// per-candidate confidence; surfacing it into this value type is what lets filter #1 run. | ||
| struct OCRLine: Equatable { | ||
| struct OCRLine: Equatable, Sendable { |
Contributor
There was a problem hiding this comment.
The doc-comment on
OCRLine.confidence says "The orchestrating extractor currently discards Vision's per-candidate confidence" — but that statement is precisely what this PR fixes. Leaving it in place makes the comment actively misleading for anyone reading the type definition after the change lands.
Suggested change
| /// Confidence is carried alongside the text because the cheapest, highest-signal filter | |
| /// (`dropLowConfidence`) needs it. The orchestrating extractor currently discards Vision's | |
| /// per-candidate confidence; surfacing it into this value type is what lets filter #1 run. | |
| struct OCRLine: Equatable { | |
| struct OCRLine: Equatable, Sendable { | |
| /// Confidence is carried alongside the text because the cheapest, highest-signal filter | |
| /// (`dropLowConfidence`) needs it. `ScreenTextExtractor` surfaces Vision's per-candidate | |
| /// confidence here so filter #1 operates on real recognizer scores instead of a constant. | |
| struct OCRLine: Equatable, Sendable { |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ScreenTextExtractordiscarded Vision's per-observation confidence, soOCRTextHygiene.dropLowConfidence— the cheapest, highest-signal hygiene filter — only ever saw a synthesized1.0and dropped nothing. This carries each recognized line's real confidence through a newExtractedScreenText.linesfield and feeds those lines toOCRTextHygiene.clean, so the recognizer's weakest guesses are dropped before they reach the completion prompt.Validation
test_generateContext_dropsLowConfidenceOCRLinesproves the wiring end-to-end: a clean, plausible sentence at 0.2 confidence is dropped while a 0.95-confidence line survives, even though no other hygiene filter would catch the former.lines(at a confidence above the threshold), so they keep exercising the symbol/digit/sanitizer filters unchanged.Linked issues
None.
Risk / rollout notes
dropLowConfidencethreshold (0.4). Lines Vision is unsure about no longer enter the prompt; this only affects the screen-context excerpt, not completion generation.ExtractedScreenText.linesdefaults to empty for any caller that supplies only joined text, so the type stays backward compatible.OCRTextHygiene.OCRLinegainsSendable(pure value type) to cross the OCR continuation boundary inside theSendableExtractedScreenText.Greptile Summary
This PR threads Vision's per-observation confidence through
ExtractedScreenText.linessoOCRTextHygiene.dropLowConfidence— previously neutralized by a synthesized1.0— now filters on real recognizer scores before lines reach the completion prompt.ExtractedScreenTextgains alines: [OCRTextHygiene.OCRLine]field (default-empty, backward compatible), andScreenTextExtractorpopulates it with each top candidate's actualconfidencevalue.ScreenshotContextGeneratorreplaces the old re-split-with-fake-confidence pattern with a direct pass ofextracted.linestoOCRTextHygiene.clean.OCRLinepicks upSendableconformance, and a new end-to-end test confirms the wiring: a 0.2-confidence line is dropped while a 0.95-confidence line survives.Confidence Score: 4/5
Safe to merge — the core wiring is correct and the new test provides solid end-to-end coverage of the confidence gate.
The only finding is a one-sentence doc-comment in OCRTextHygiene.swift that still says the extractor currently discards confidence, directly contradicting what this PR accomplishes. It does not affect runtime behavior.
Cotabby/Support/OCRTextHygiene.swift — the OCRLine doc-comment contains a now-false claim about confidence being discarded.
Important Files Changed
lines: [OCRTextHygiene.OCRLine]toExtractedScreenText(default-empty for backward compatibility) and rewires the Vision callback to capturecandidate.confidenceper observation instead of discarding it..text(then re-splitting with synthetic confidence 1.0) to storing the fullExtractedScreenTextand passingextracted.linesdirectly toOCRTextHygiene.clean, enabling real confidence gating.Sendableconformance toOCRLine(correct). The doc-comment onOCRLine.confidencecontains a stale sentence claiming the extractor currently discards confidence, which this PR directly contradicts.OCRLineobjects at confidence 0.9 to preserve prior test coverage. New test proves end-to-end wiring: 0.2-confidence line is dropped, 0.95-confidence line survives.Sequence Diagram
sequenceDiagram participant SCG as ScreenshotContextGenerator participant STE as ScreenTextExtractor participant Vision as Vision OCR participant Hyg as OCRTextHygiene.clean SCG->>STE: extractText(from: image) STE->>Vision: VNRecognizeTextRequest Vision-->>STE: [VNRecognizedTextObservation] Note over STE: Build OCRLine(text, confidence)<br/>for each top candidate STE-->>SCG: ExtractedScreenText(text, lineCount, lines) Note over SCG: lines carry real Vision confidence<br/>(was: re-split text with confidence=1.0) SCG->>Hyg: clean(lines: extracted.lines, maxChars:) Note over Hyg: 1. dropLowConfidence (threshold 0.4) now effective<br/>2. dropReplacementCharacter<br/>3. dropHighSymbolDensity<br/>4. dropDigitSubstitution<br/>5. dropLowWordCharacterRatio<br/>6. strip fieldText echoes Hyg-->>SCG: cleaned OCR string SCG-->>SCG: return VisualContextExcerptReviews (1): Last reviewed commit: "Surface per-line OCR confidence so the l..." | Re-trigger Greptile