Skip to content

Claude/identify nanobot changes qpm8 z#28

Closed
chancsc wants to merge 8 commits into
mnemon-dev:masterfrom
chancsc:claude/identify-nanobot-changes-QPM8Z
Closed

Claude/identify nanobot changes qpm8 z#28
chancsc wants to merge 8 commits into
mnemon-dev:masterfrom
chancsc:claude/identify-nanobot-changes-QPM8Z

Conversation

@chancsc
Copy link
Copy Markdown
Contributor

@chancsc chancsc commented May 18, 2026

sync code

Nanobot and others added 8 commits April 30, 2026 04:41
Merged changes from head repo
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nomic-embed-text produces cosine ~0.75 for same-domain different-fact pairs
(e.g. two butterfly survey records at different locations). The old threshold
of 0.70 let cosine override token similarity, incorrectly classifying distinct
insights as UPDATE and replacing the original. Raising to 0.85 ensures cosine
only confirms deduplication when texts are genuinely near-identical.

Adds regression test with controlled 0.75-cosine fake embeddings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ContentSimilarity (bidirectional max) was too sensitive for formulaic
scientific records: a Raub butterfly entry sharing the species name and
standard phrasing with a Kinabalu entry produced tokenSim=0.5, crossing
the UPDATE threshold and replacing the original.

Jaccard (|A∩B|/|A∪B|) penalises texts that share domain vocabulary but
have many distinct tokens (different facts). Same-domain different-location
pairs now score ~0.28, falling below the 0.5 ADD threshold. Genuine
one-word-change updates (SQLite→PostgreSQL) still score ~0.6 → UPDATE.

ContentSimilarity is unchanged — bidirectional max remains correct for
recall and keyword search.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ity>=0.7

Two bugs caused CONFLICT false positives on butterfly survey data:

1. "not" in negationWords fires on virtually all scientific text
   ("species not previously recorded", "not endemic to region").
   Removed: only multi-word state-change phrases remain as signals.

2. Negation check fired at similarity>=0.5. At borderline similarity,
   texts share domain vocabulary without being about the same subject.
   Now only checked when similarity>=0.7.

Also updates guide.md: PDF/external-document facts must use --no-diff
since each document is a distinct authoritative source.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Upstream changes included:
- Nanobot integration officially merged (PR mnemon-dev#24) — our contribution
- Codex integration added (PR mnemon-dev#27)
- v0.1.5 dedup fixes merged (PR mnemon-dev#25) — our contribution
- v0.1.6 release notes

Conflict resolution:
- cmd/setup.go, assets/assets.go: took upstream (adds Codex alongside Nanobot)
- SKILL.md: took upstream (our reviewed version with softened guardrail)
- README.md: kept upstream harness wording and Vision paragraph
@chancsc chancsc closed this May 18, 2026
@chancsc
Copy link
Copy Markdown
Contributor Author

chancsc commented May 18, 2026

Mistake

@chancsc chancsc deleted the claude/identify-nanobot-changes-QPM8Z branch May 18, 2026 11:34
@chancsc chancsc restored the claude/identify-nanobot-changes-QPM8Z branch May 18, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants