fix(memos-local-plugin): two-phase migration to prevent crash-loop on large databases by chiefmojo · Pull Request #1789 · MemTensor/MemOS

chiefmojo · 2026-05-23T18:15:15Z

Problem

Migration 007 (namespace-visibility) in v2.0.2–v2.0.5 runs UPDATE … SET share_scope='private' plus CREATE INDEX on the traces table inside a single db.tx(). On databases larger than ~500MB, this exceeds the host gateway's kill timeout. SQLite rolls back the entire transaction — including the schema_migrations INSERT — so migration 007 is never recorded and the bridge restarts into the same hang forever.

Small databases (~43MB) complete within the timeout, which is why this only manifests on larger installs. Tested against a 687MB database with ~98,000 traces.

Fix

Two-phase migration:

Phase 1 (inside transaction, milliseconds): ADD COLUMN only on all 12 namespace tables, plus DROP INDEX uq_skills_name. The schema_migrations record for v7 commits here.
Phase 2 (after migration loop, outside any transaction): Batched UPDATE in 2,000-row chunks (each its own implicit transaction) for share_scope backfill, then CREATE INDEX IF NOT EXISTS for all 18 owner/share indexes. ensureNamespaceColumns is called unconditionally on every boot so new tables in the namespace list get their columns.

Restart safety: If the bridge is killed during Phase 2, the v7 schema_migrations record survives (Phase 1 committed). Next boot skips Phase 1 entirely and resumes Phase 2 where it left off. The crash-loop is broken.

Verification

Confirmed the DB on the affected instance had exactly migrations 1–6 applied — migration 007 was never committed despite hundreds of boot attempts.
After the fix, Phase 1 committed in 2.4 seconds. Phase 2 backfill and index creation complete across bridge restarts.
Bridge boots cleanly with all 18 indexes created and pipeline.ready.

… large databases Migration 007 (namespace-visibility) runs UPDATE ... SET share_scope and CREATE INDEX on the traces table inside a single db.tx(). On databases larger than ~500MB, this exceeds the host gateway kill timeout, SQLite rolls back the entire transaction (including the schema_migrations INSERT), and the bridge restarts into the same hang forever. This splits the migration into two phases: Phase 1 (inside transaction, ms): ADD COLUMN only on 12 namespace tables plus DROP INDEX uq_skills_name. The schema_migrations record commits here. Phase 2 (after migration loop, outside any transaction): Batched UPDATE in 2,000-row chunks (each its own implicit transaction) for share_scope backfill, then CREATE INDEX IF NOT EXISTS for all 18 owner/share indexes. Phase 2 also calls ensureNamespaceColumns unconditionally so new tables added to the namespace list get their columns on every boot. Restart-safe: if the bridge is killed during Phase 2, the v7 schema_migrations record survives (Phase 1 committed). Next boot skips Phase 1 entirely and resumes Phase 2 where it left off. Closes MemTensor#1787

This was referenced May 26, 2026

Reward pipeline skips abandoned episodes — 98% of closed episodes never scored #1782

Open

v2.0.2+ regression: bootstrapMemoryCoreFull() hangs with 100% CPU on databases >500MB #1787

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memos-local-plugin): two-phase migration to prevent crash-loop on large databases#1789

fix(memos-local-plugin): two-phase migration to prevent crash-loop on large databases#1789
chiefmojo wants to merge 1 commit into
MemTensor:mainfrom
chiefmojo:fix/large-db-migration-crash

chiefmojo commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chiefmojo commented May 23, 2026

Problem

Fix

Verification

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants