Add skipDuplicates() scope guard to createDocuments#855
Add skipDuplicates() scope guard to createDocuments#855premtsd-code wants to merge 1 commit intomainfrom
Conversation
Introduces a RAII-style scope guard that lets callers opt into silent
duplicate handling for batch inserts. Duplicate-key errors during the
wrapped call are silenced at the adapter layer via dialect-specific
no-op inserts; everything outside the guarded call behaves normally.
Usage:
$db->skipDuplicates(fn () => $db->createDocuments($collection, $docs));
Adapter dialects:
- MariaDB/MySQL: INSERT IGNORE INTO
- Postgres: INSERT INTO ... ON CONFLICT (...) DO NOTHING
- SQLite: INSERT OR IGNORE INTO
- Mongo: upsert + \$setOnInsert, bypassing the transaction wrap to
avoid txn-abort-on-duplicate
Hot-path cost: one property read + one boolean branch per chunk.
Callers that don't use skipDuplicates pay no closure allocation and
no scope setup.
Mirror::createDocuments: the skipDuplicates path re-fetches source's
authoritative state via find() after the adapter insert settles, then
forwards that to destination instead of the caller's input. This is
race-free because the query runs after source's write has resolved —
regardless of whether rows were inserted, skipped, or raced with
concurrent writers, destination always receives a faithful snapshot
of source. Fixes Greptile #3084293974: captureOnNext-based approach
forwarded would-be values for source-skipped duplicates, diverging
destination from source.
upsertDocumentsWithIncrease: restore per-tenant grouping for the
existing-doc lookup when running in sharedTables + tenantPerDocument
mode. The previously-inlined find() ran under the session tenant
context (null for platform workers) and silently missed rows from
other tenants, which broke the StatsUsage worker flushing stats
across projects. Per-tenant grouping uses K queries (K = unique
tenants in the batch) instead of N (per-doc) or 1 (broken). The
common single-tenant path still hits the fast batched code.
Tests cover every adapter: intra-batch duplicates, cross-batch
duplicates, nested scope, large batches, relationships, Mirror
dual-write with authoritative state forwarding.
📝 WalkthroughWalkthroughThis PR introduces a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Database
participant Adapter as Adapter<br/>(Mongo/SQL/etc.)
participant DB as Database<br/>Backend
User->>Database: skipDuplicates(callback)
Database->>Database: Set skipDuplicates flag = true
Database->>Database: Execute callback<br/>(createDocuments)
Database->>Adapter: skipDuplicates(insert_fn)
Adapter->>Adapter: Set skipDuplicates flag = true
Adapter->>Adapter: Execute insert_fn()
alt Mongo Adapter
Adapter->>DB: upsert with filter {_uid, _tenant?}
else SQL Adapter
Adapter->>DB: INSERT...ON CONFLICT DO NOTHING
else SQLite Adapter
Adapter->>DB: INSERT OR IGNORE INTO...
end
DB-->>Adapter: Success (skip duplicates)
Adapter->>Adapter: Restore skipDuplicates flag
Adapter-->>Database: Return result
Database->>Database: Restore skipDuplicates flag
Database-->>User: Return result
sequenceDiagram
participant User
participant Pool
participant PinnedAdapter as Pinned<br/>Adapter
participant BorrowedAdapter as Borrowed<br/>Adapter
participant DB as Database<br/>Backend
User->>Pool: delegate(method, args)
Pool->>Pool: Check skipDuplicates flag
alt Transaction active (pinned)
Pool->>PinnedAdapter: skipDuplicates(fn => method(...args))
PinnedAdapter->>PinnedAdapter: Set flag, execute method
PinnedAdapter->>DB: Delegated operation
DB-->>PinnedAdapter: Result
PinnedAdapter-->>Pool: Return result
else No transaction (borrow adapter)
Pool->>BorrowedAdapter: Configure (tenant, auth, etc.)
Pool->>BorrowedAdapter: skipDuplicates(fn => method(...args))
BorrowedAdapter->>BorrowedAdapter: Set flag, execute method
BorrowedAdapter->>DB: Delegated operation
DB-->>BorrowedAdapter: Result
BorrowedAdapter-->>Pool: Return result
end
Pool-->>User: Return result
sequenceDiagram
participant User
participant Mirror
participant SourceDB as Source<br/>Adapter
participant DestDB as Destination<br/>Adapter
User->>Mirror: createDocuments([doc1_dup, doc2_new])
Mirror->>SourceDB: skipDuplicates(fn => createDocuments(...))
SourceDB->>SourceDB: Apply skipDuplicates logic
SourceDB-->>Mirror: Modified count (1 inserted: doc2_new)
Mirror->>Mirror: Filter non-empty IDs
Mirror->>SourceDB: find(filtered_ids)
SourceDB-->>Mirror: Authoritative docs [doc2_new]
Mirror->>Mirror: Clone, apply writeFilters
Mirror->>DestDB: skipDuplicates(fn => createDocuments([doc2_new]))
DestDB->>DestDB: Apply skipDuplicates logic
DestDB-->>Mirror: Success
Mirror->>Mirror: Run afterCreateDocument filters
Mirror-->>User: Return modified count
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 PHPStan (2.1.46)Composer install failed: this project depends on private packages that require authentication (e.g. GitLab/GitHub, Laravel Nova, etc.). Instead, run PHPStan in a CI/CD pipeline where you can use custom packages — our pipeline remediation tool can use the PHPStan output from your CI/CD pipeline. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Database/Adapter/Mongo.php (1)
1477-1529:⚠️ Potential issue | 🟠 MajorDon't reuse an active session in the skip-duplicates branch.
$optionsis populated before the branch, so an explicitly started Mongo transaction still passessessionintoupsert(). That reintroduces the same txn-abort/write-conflict path this code is trying to avoid.Proposed fix
- $options = $this->getTransactionOptions(); $records = []; $hasSequence = null; $documents = \array_map(fn ($doc) => clone $doc, $documents); @@ // insertMany aborts the txn on any duplicate; upsert + $setOnInsert no-ops instead. if ($this->skipDuplicates) { + $options = []; if (empty($records)) { return []; } @@ return $documents; } + $options = $this->getTransactionOptions(); try { $documents = $this->client->insertMany($name, $records, $options);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Database/Adapter/Mongo.php` around lines 1477 - 1529, The code passes $options (from getTransactionOptions()) including an active session into client->upsert() in the skip-duplicates branch; clone $options to a local variable (e.g. $upsertOptions = $options) and remove the session before calling upsert (unset $upsertOptions['session'] or remove the session entry if options is an object) so upsert() is executed outside the active transaction; update the call to $this->client->upsert($name, $operations, $upsertOptions) and keep the rest of the skip-duplicates logic unchanged.
🧹 Nitpick comments (1)
src/Database/Adapter/SQL.php (1)
1036-1039: Base INSERT keyword should default to the portable form; MySQL-family adapters can override.
INSERT IGNOREis MySQL/MariaDB-specific syntax. While all adapters that use suffix-based duplicate handling (Postgres, SQLite) currently overridegetInsertKeyword()correctly, the base implementation should default to plainINSERT INTOfor better portability and defensive design. This makes it safer for future adapters that may inherit from SQL without intending to useINSERT IGNORE.Consider moving the MySQL-specific logic to MySQL.php and MariaDB.php overrides:
- Base SQL::getInsertKeyword() → return 'INSERT INTO'
- MySQL.php and MariaDB.php → override to return skipDuplicates ? 'INSERT IGNORE INTO' : 'INSERT INTO'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Database/Adapter/SQL.php` around lines 1036 - 1039, Change the base SQL class's getInsertKeyword() to return the portable default 'INSERT INTO' instead of the MySQL-specific 'INSERT IGNORE INTO' by removing the skipDuplicates branch from SQL::getInsertKeyword(); then add overrides in the MySQL and MariaDB adapters (implement getInsertKeyword() in MySQL.php and MariaDB.php) that return skipDuplicates ? 'INSERT IGNORE INTO' : 'INSERT INTO' using the existing skipDuplicates property so only MySQL-family adapters emit IGNORE while other adapters remain portable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Database/Adapter/SQLite.php`:
- Around line 1940-1943: The current getInsertKeyword() returns 'INSERT OR
IGNORE INTO', which silences all constraint errors; change the logic so when
$this->skipDuplicates is true the generated SQL uses SQLite's targeted conflict
clause for the _uid column (e.g., use an INSERT ... ON CONFLICT(_uid) DO NOTHING
form) instead of INSERT OR IGNORE; update getInsertKeyword() (and any place that
composes the INSERT statement if it only expects a keyword) to emit the ON
CONFLICT(_uid) DO NOTHING variant so only _uid uniqueness conflicts are
suppressed while other constraint violations still raise errors.
In `@src/Database/Database.php`:
- Around line 7153-7188: The pre-read find() calls build Query::equal('$id',
...) over full ID lists which can exceed DocumentsValidator limits; modify both
branches (the tenant-batched branch that builds $idsByTenant and the else branch
that builds $docIds) to split each ID array into chunks using array_chunk($ids,
max(1, $this->maxQueryValues)) and call find() for each chunk (preserving the
same withTenant/authorization/silent wrappers), collecting/merging all returned
documents into $existingDocs (use the same keys: "$tenant:$id" in the tenant
branch and $this->tenantKey($doc) in the else branch). Apply the same chunking
approach to the related callers createDocuments() and
upsertDocumentsWithIncrease() when they prefetch existing document IDs so no
single Query::equal('$id', ...) exceeds $this->maxQueryValues.
- Around line 5728-5732: The adapter's skipDuplicates guard is applied only
around createDocuments, but createDocumentRelationships runs earlier and can
mutate related rows before the insert; change the flow so the skipDuplicates
context (adapter->skipDuplicates or equivalent flag) and the withTransaction are
established before calling createDocumentRelationships and createDocuments,
i.e., wrap both createDocumentRelationships(...) and the $insert closure inside
the adapter->skipDuplicates(...) (or set the adapter flag prior to relationship
handling) so relationship side effects are subject to the same dedupe guard and
transaction; update references around createDocumentRelationships, $insert,
withTransaction, adapter->skipDuplicates and adapter->createDocuments
accordingly.
In `@src/Database/Mirror.php`:
- Around line 632-645: The current re-fetch uses a single Query::equal('$id',
$ids) which can fail validation or miss rows for mixed-tenant/large batches;
change the logic in Mirror::(around the $authoritative fetch) to reuse the same
chunking and tenant-grouping approach used in Database::createDocuments() and
Database::upsertDocumentsWithIncrease(): split $ids into chunks with
array_chunk($ids, max(1, $this->maxQueryValues)), group ids by tenant (or run
per-current-tenant when non-multi-tenant), call $this->source->silent(fn() =>
$this->source->find($collection, [Query::equal('$id', $chunk),
Query::limit(count($chunk))])) for each chunk, aggregate the results into
$authoritative, and then continue so skipDuplicates() runs per-chunk/tenant
instead of on one large mixed batch.
- Around line 641-646: The authoritative re-fetch must bypass the caller's read
authorization so it returns all newly created docs; replace the current
caller-scoped fetch ($authoritative = $this->source->silent(fn () =>
$this->source->find(...))) with an elevated/system context call that disables
read authorization (e.g. use the data source's
"runAsSystem"/"withoutAuthorization"/auth-bypass helper to invoke
$this->source->find($collection, [Query::equal('$id', $ids),
Query::limit(count($ids))]) so the fetch is performed with full privileges and
cannot be limited by the caller's permissions.
In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 8029-8059: Add a test variant that exercises exceptional exits of
the RAII-style scope guard: call $database->skipDuplicates(...) with an inner or
outer callback that throws (use a deliberate throw or trigger
DuplicateException) and ensure both the thrown exception is caught in the test
and that after the exception the skip flag is restored by calling
$database->createDocuments($collection, [$makeDoc('seed','ShouldThrow')]) and
asserting it throws a DuplicateException; reference the existing nested test
using skipDuplicates and createDocuments and add analogous cases where the inner
callback throws and where the outer callback throws, then assert a plain
createDocuments still raises DuplicateException afterward.
- Around line 7859-8262: Add a new test (e.g.,
testCreateDocumentsSkipDuplicatesCustomUniqueIndex) that creates a collection,
creates a unique attribute (like 'email' via Database::createAttribute with the
unique flag), seeds one document with an email, then calls
$database->skipDuplicates(...) to createDocuments with a different $id but the
same email and assert the intended behavior: either the call throws
DuplicateException or (if the adapter intentionally uses blanket INSERT IGNORE)
verify no overwrite occurred and only one row with that email exists; use
$database->createAttribute, $database->createDocuments,
$database->skipDuplicates, $database->getAdapter(), $database->getDocument and
$database->find to implement the assertions and branch expectations based on
adapter characteristics so the test fails if custom-unique violations are
silently suppressed incorrectly.
---
Outside diff comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1477-1529: The code passes $options (from getTransactionOptions())
including an active session into client->upsert() in the skip-duplicates branch;
clone $options to a local variable (e.g. $upsertOptions = $options) and remove
the session before calling upsert (unset $upsertOptions['session'] or remove the
session entry if options is an object) so upsert() is executed outside the
active transaction; update the call to $this->client->upsert($name, $operations,
$upsertOptions) and keep the rest of the skip-duplicates logic unchanged.
---
Nitpick comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 1036-1039: Change the base SQL class's getInsertKeyword() to
return the portable default 'INSERT INTO' instead of the MySQL-specific 'INSERT
IGNORE INTO' by removing the skipDuplicates branch from SQL::getInsertKeyword();
then add overrides in the MySQL and MariaDB adapters (implement
getInsertKeyword() in MySQL.php and MariaDB.php) that return skipDuplicates ?
'INSERT IGNORE INTO' : 'INSERT INTO' using the existing skipDuplicates property
so only MySQL-family adapters emit IGNORE while other adapters remain portable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e21e7e44-cb27-40b8-b581-18d92f3bb185
📒 Files selected for processing (10)
src/Database/Adapter.phpsrc/Database/Adapter/Mongo.phpsrc/Database/Adapter/Pool.phpsrc/Database/Adapter/Postgres.phpsrc/Database/Adapter/SQL.phpsrc/Database/Adapter/SQLite.phpsrc/Database/Database.phpsrc/Database/Mirror.phptests/e2e/Adapter/MirrorTest.phptests/e2e/Adapter/Scopes/DocumentTests.php
| protected function getInsertKeyword(): string | ||
| { | ||
| return $this->skipDuplicates ? 'INSERT OR IGNORE INTO' : 'INSERT INTO'; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does SQLite INSERT OR IGNOREsuppress NOT NULL and CHECK violations in addition to UNIQUE/PRIMARY KEY conflicts, and since which SQLite version isINSERT ... ON CONFLICT(column-list) DO NOTHING supported?
💡 Result:
Yes, SQLite INSERT OR IGNORE suppresses NOT NULL violations (skips the row silently) in addition to UNIQUE/PRIMARY KEY conflicts, but it does NOT suppress CHECK violations (those cause ABORT like default). INSERT ... ON CONFLICT(column-list) DO NOTHING (the UPSERT syntax specifying columns) has been supported since SQLite version 3.24.0 (2018-06-04).
Citations:
- 1: https://www2.sqlite.org/lang_conflict.html
- 2: https://sqlite.org/lang_conflict.html
- 3: https://www3.sqlite.org/lang_UPSERT.html
- 4: https://sqlite.org/lang_UPSERT.html
INSERT OR IGNORE suppresses more than just duplicate constraints—fix to target only _uid conflicts.
SQLite's INSERT OR IGNORE silently suppresses NOT NULL and UNIQUE/PRIMARY KEY violations alike. This means rows failing unrelated constraints (e.g., NOT NULL on a required field) are dropped without error, violating the design intent that only duplicate _uid conflicts should be silently skipped.
Use INSERT ... ON CONFLICT(_uid) DO NOTHING instead to suppress only the intended unique keys. This syntax is supported in SQLite 3.24.0+ and applies the constraint exception only to the specified columns.
Suggested direction
protected function getInsertKeyword(): string
{
- return $this->skipDuplicates ? 'INSERT OR IGNORE INTO' : 'INSERT INTO';
+ return 'INSERT INTO';
+}
+
+protected function getInsertSuffix(string $table): string
+{
+ if (!$this->skipDuplicates) {
+ return '';
+ }
+
+ $conflictTarget = $this->sharedTables ? '(`_tenant`, `_uid`)' : '(`_uid`)';
+
+ return "ON CONFLICT {$conflictTarget} DO NOTHING";
+}
+
+protected function getInsertPermissionsSuffix(): string
+{
+ if (!$this->skipDuplicates) {
+ return '';
+ }
+
+ $conflictTarget = $this->sharedTables
+ ? '(`_tenant`, `_document`, `_type`, `_permission`)'
+ : '(`_document`, `_type`, `_permission`)';
+
+ return "ON CONFLICT {$conflictTarget} DO NOTHING";
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/Adapter/SQLite.php` around lines 1940 - 1943, The current
getInsertKeyword() returns 'INSERT OR IGNORE INTO', which silences all
constraint errors; change the logic so when $this->skipDuplicates is true the
generated SQL uses SQLite's targeted conflict clause for the _uid column (e.g.,
use an INSERT ... ON CONFLICT(_uid) DO NOTHING form) instead of INSERT OR
IGNORE; update getInsertKeyword() (and any place that composes the INSERT
statement if it only expects a keyword) to emit the ON CONFLICT(_uid) DO NOTHING
variant so only _uid uniqueness conflicts are suppressed while other constraint
violations still raise errors.
| $insert = fn () => $this->withTransaction(fn () => $this->adapter->createDocuments($collection, $chunk)); | ||
| // Set adapter flag before withTransaction so Mongo can opt out of a real txn. | ||
| $batch = $this->skipDuplicates | ||
| ? $this->adapter->skipDuplicates($insert) | ||
| : $insert(); |
There was a problem hiding this comment.
skipDuplicates() does not cover relationship side effects.
Line 5728 only guards the adapter insert. By then, createDocumentRelationships() has already run for every input document, so a duplicate parent can still create/update related rows—or hit a duplicate on an existing junction record—before the guarded insert is reached. That makes skipDuplicates(createDocuments(...)) non-idempotent for documents carrying relationship payloads.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/Database.php` around lines 5728 - 5732, The adapter's
skipDuplicates guard is applied only around createDocuments, but
createDocumentRelationships runs earlier and can mutate related rows before the
insert; change the flow so the skipDuplicates context (adapter->skipDuplicates
or equivalent flag) and the withTransaction are established before calling
createDocumentRelationships and createDocuments, i.e., wrap both
createDocumentRelationships(...) and the $insert closure inside the
adapter->skipDuplicates(...) (or set the adapter flag prior to relationship
handling) so relationship side effects are subject to the same dedupe guard and
transaction; update references around createDocumentRelationships, $insert,
withTransaction, adapter->skipDuplicates and adapter->createDocuments
accordingly.
| if ($this->getSharedTables() && $this->getTenantPerDocument()) { | ||
| $idsByTenant = []; | ||
| foreach ($documents as $doc) { | ||
| if ($doc->getId() !== '') { | ||
| $idsByTenant[$doc->getTenant()][] = $doc->getId(); | ||
| } | ||
| } | ||
| foreach ($idsByTenant as $tenant => $tenantIds) { | ||
| $tenantIds = \array_values(\array_unique($tenantIds)); | ||
| $found = $this->authorization->skip(fn () => $this->withTenant($tenant, fn () => $this->silent( | ||
| fn () => $this->find($collection->getId(), [ | ||
| Query::equal('$id', $tenantIds), | ||
| Query::limit(\count($tenantIds)), | ||
| ]) | ||
| ))); | ||
| foreach ($found as $doc) { | ||
| $existingDocs[$tenant . ':' . $doc->getId()] = $doc; | ||
| } | ||
| } | ||
| } else { | ||
| $docIds = \array_values(\array_unique(\array_filter( | ||
| \array_map(fn (Document $doc) => $doc->getId(), $documents), | ||
| fn ($id) => $id !== '' | ||
| ))); | ||
|
|
||
| if (!empty($docIds)) { | ||
| $existing = $this->authorization->skip(fn () => $this->silent( | ||
| fn () => $this->find($collection->getId(), [ | ||
| Query::equal('$id', $docIds), | ||
| Query::limit(\count($docIds)), | ||
| ]) | ||
| )); | ||
| foreach ($existing as $doc) { | ||
| $existingDocs[$this->tenantKey($doc)] = $doc; | ||
| } | ||
| } |
There was a problem hiding this comment.
Chunk these pre-read find() calls by maxQueryValues.
Both branches build a single Query::equal('$id', ...) over the full ID set. Once a tenant batch or the global batch exceeds $this->maxQueryValues, find() can fail validation before the upsert loop even starts, which regresses large upserts. Please split each ID list with array_chunk(..., max(1, $this->maxQueryValues)) and merge the results into $existingDocs. Based on learnings, createDocuments() and upsertDocumentsWithIncrease() should prefetch existing document IDs in array_chunk(..., max(1, $this->maxQueryValues)) batches to satisfy DocumentsValidator limits on Query::equal('$id', ...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/Database.php` around lines 7153 - 7188, The pre-read find()
calls build Query::equal('$id', ...) over full ID lists which can exceed
DocumentsValidator limits; modify both branches (the tenant-batched branch that
builds $idsByTenant and the else branch that builds $docIds) to split each ID
array into chunks using array_chunk($ids, max(1, $this->maxQueryValues)) and
call find() for each chunk (preserving the same withTenant/authorization/silent
wrappers), collecting/merging all returned documents into $existingDocs (use the
same keys: "$tenant:$id" in the tenant branch and $this->tenantKey($doc) in the
else branch). Apply the same chunking approach to the related callers
createDocuments() and upsertDocumentsWithIncrease() when they prefetch existing
document IDs so no single Query::equal('$id', ...) exceeds
$this->maxQueryValues.
| $ids = \array_values(\array_filter( | ||
| \array_map(fn (Document $d) => $d->getId(), $documents), | ||
| fn ($id) => $id !== '' | ||
| )); | ||
|
|
||
| if (empty($ids)) { | ||
| return $modified; | ||
| } | ||
|
|
||
| $authoritative = $this->source->silent( | ||
| fn () => $this->source->find($collection, [ | ||
| Query::equal('$id', $ids), | ||
| Query::limit(\count($ids)), | ||
| ]) |
There was a problem hiding this comment.
Reuse the existing chunking and tenant grouping for the ID re-fetch.
This single Query::equal('$id', $ids) reintroduces the exact large-batch and tenant-per-document edge cases the main create path already chunks around. skipDuplicates() on a big mixed-tenant batch can fail validation or miss rows under the current tenant context.
Based on learnings, In src/Database/Database.php, both createDocuments() and upsertDocumentsWithIncrease() prefetch existing document IDs in chunks using array_chunk(..., max(1, $this->maxQueryValues)) for tenant-per-document and non-tenant modes to satisfy DocumentsValidator limits on Query::equal('$id', ...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/Mirror.php` around lines 632 - 645, The current re-fetch uses a
single Query::equal('$id', $ids) which can fail validation or miss rows for
mixed-tenant/large batches; change the logic in Mirror::(around the
$authoritative fetch) to reuse the same chunking and tenant-grouping approach
used in Database::createDocuments() and Database::upsertDocumentsWithIncrease():
split $ids into chunks with array_chunk($ids, max(1, $this->maxQueryValues)),
group ids by tenant (or run per-current-tenant when non-multi-tenant), call
$this->source->silent(fn() => $this->source->find($collection,
[Query::equal('$id', $chunk), Query::limit(count($chunk))])) for each chunk,
aggregate the results into $authoritative, and then continue so skipDuplicates()
runs per-chunk/tenant instead of on one large mixed batch.
| $authoritative = $this->source->silent( | ||
| fn () => $this->source->find($collection, [ | ||
| Query::equal('$id', $ids), | ||
| Query::limit(\count($ids)), | ||
| ]) | ||
| ); |
There was a problem hiding this comment.
Bypass read authorization for the authoritative re-fetch.
This find() runs under the caller's authorization context. If a batch creates documents the caller cannot immediately read back, $authoritative becomes partial/empty and the destination silently diverges.
Proposed fix
- $authoritative = $this->source->silent(
- fn () => $this->source->find($collection, [
- Query::equal('$id', $ids),
- Query::limit(\count($ids)),
- ])
- );
+ $authoritative = $this->source->getAuthorization()->skip(
+ fn () => $this->source->silent(
+ fn () => $this->source->find($collection, [
+ Query::equal('$id', $ids),
+ Query::limit(\count($ids)),
+ ])
+ )
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| $authoritative = $this->source->silent( | |
| fn () => $this->source->find($collection, [ | |
| Query::equal('$id', $ids), | |
| Query::limit(\count($ids)), | |
| ]) | |
| ); | |
| $authoritative = $this->source->getAuthorization()->skip( | |
| fn () => $this->source->silent( | |
| fn () => $this->source->find($collection, [ | |
| Query::equal('$id', $ids), | |
| Query::limit(\count($ids)), | |
| ]) | |
| ) | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/Mirror.php` around lines 641 - 646, The authoritative re-fetch
must bypass the caller's read authorization so it returns all newly created
docs; replace the current caller-scoped fetch ($authoritative =
$this->source->silent(fn () => $this->source->find(...))) with an
elevated/system context call that disables read authorization (e.g. use the data
source's "runAsSystem"/"withoutAuthorization"/auth-bypass helper to invoke
$this->source->find($collection, [Query::equal('$id', $ids),
Query::limit(count($ids))]) so the fetch is performed with full privileges and
cannot be limited by the caller's permissions.
| public function testCreateDocumentsIgnoreDuplicates(): void | ||
| { | ||
| /** @var Database $database */ | ||
| $database = $this->getDatabase(); | ||
|
|
||
| $database->createCollection(__FUNCTION__); | ||
| $database->createAttribute(__FUNCTION__, 'name', Database::VAR_STRING, 128, true); | ||
|
|
||
| // Insert initial documents | ||
| $database->createDocuments(__FUNCTION__, [ | ||
| new Document([ | ||
| '$id' => 'doc1', | ||
| 'name' => 'Original A', | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| new Document([ | ||
| '$id' => 'doc2', | ||
| 'name' => 'Original B', | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| ]); | ||
|
|
||
| // Without ignore, duplicates should throw | ||
| try { | ||
| $database->createDocuments(__FUNCTION__, [ | ||
| new Document([ | ||
| '$id' => 'doc1', | ||
| 'name' => 'Duplicate A', | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| ]); | ||
| $this->fail('Expected DuplicateException'); | ||
| } catch (DuplicateException $e) { | ||
| $this->assertNotEmpty($e->getMessage()); | ||
| } | ||
|
|
||
| // With skipDuplicates, duplicates should be silently skipped | ||
| $emittedIds = []; | ||
| $collection = __FUNCTION__; | ||
| $count = $database->skipDuplicates(function () use ($database, $collection, &$emittedIds) { | ||
| return $database->createDocuments($collection, [ | ||
| new Document([ | ||
| '$id' => 'doc1', | ||
| 'name' => 'Duplicate A', | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| new Document([ | ||
| '$id' => 'doc3', | ||
| 'name' => 'New C', | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| ], onNext: function (Document $doc) use (&$emittedIds) { | ||
| $emittedIds[] = $doc->getId(); | ||
| }); | ||
| }); | ||
|
|
||
| $this->assertSame(2, $count); | ||
| $this->assertCount(2, $emittedIds); | ||
| \sort($emittedIds); | ||
| $this->assertSame(['doc1', 'doc3'], $emittedIds); | ||
|
|
||
| $doc1 = $database->getDocument(__FUNCTION__, 'doc1'); | ||
| $this->assertSame('Original A', $doc1->getAttribute('name')); | ||
|
|
||
| $doc3 = $database->getDocument(__FUNCTION__, 'doc3'); | ||
| $this->assertSame('New C', $doc3->getAttribute('name')); | ||
|
|
||
| // Total should be 3 (doc1, doc2, doc3) | ||
| $all = $database->find(__FUNCTION__); | ||
| $this->assertCount(3, $all); | ||
| } | ||
|
|
||
| public function testCreateDocumentsIgnoreAllDuplicates(): void | ||
| { | ||
| /** @var Database $database */ | ||
| $database = $this->getDatabase(); | ||
|
|
||
| $database->createCollection(__FUNCTION__); | ||
| $database->createAttribute(__FUNCTION__, 'name', Database::VAR_STRING, 128, true); | ||
|
|
||
| // Insert initial document | ||
| $database->createDocuments(__FUNCTION__, [ | ||
| new Document([ | ||
| '$id' => 'existing', | ||
| 'name' => 'Original', | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| ]); | ||
|
|
||
| // With skipDuplicates, inserting only duplicates should succeed with no new rows | ||
| $emittedIds = []; | ||
| $collection = __FUNCTION__; | ||
| $count = $database->skipDuplicates(function () use ($database, $collection, &$emittedIds) { | ||
| return $database->createDocuments($collection, [ | ||
| new Document([ | ||
| '$id' => 'existing', | ||
| 'name' => 'Duplicate', | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| ], onNext: function (Document $doc) use (&$emittedIds) { | ||
| $emittedIds[] = $doc->getId(); | ||
| }); | ||
| }); | ||
|
|
||
| $this->assertSame(1, $count); | ||
| $this->assertSame(['existing'], $emittedIds); | ||
|
|
||
| $doc = $database->getDocument(__FUNCTION__, 'existing'); | ||
| $this->assertSame('Original', $doc->getAttribute('name')); | ||
|
|
||
| // Still only 1 document | ||
| $all = $database->find(__FUNCTION__); | ||
| $this->assertCount(1, $all); | ||
| } | ||
|
|
||
| public function testCreateDocumentsSkipDuplicatesEmptyBatch(): void | ||
| { | ||
| $database = $this->getDatabase(); | ||
|
|
||
| $collection = 'skipDupEmpty'; | ||
| $database->createCollection($collection); | ||
| $database->createAttribute($collection, 'name', Database::VAR_STRING, 128, true); | ||
|
|
||
| $count = $database->skipDuplicates(fn () => $database->createDocuments($collection, [])); | ||
|
|
||
| $this->assertSame(0, $count); | ||
| $this->assertCount(0, $database->find($collection)); | ||
| } | ||
|
|
||
| public function testCreateDocumentsSkipDuplicatesNestedScope(): void | ||
| { | ||
| $database = $this->getDatabase(); | ||
|
|
||
| $collection = 'skipDupNested'; | ||
| $database->createCollection($collection); | ||
| $database->createAttribute($collection, 'name', Database::VAR_STRING, 128, true); | ||
|
|
||
| $makeDoc = fn (string $id, string $name) => new Document([ | ||
| '$id' => $id, | ||
| 'name' => $name, | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]); | ||
|
|
||
| // Seed an existing doc | ||
| $database->createDocuments($collection, [$makeDoc('seed', 'Seed')]); | ||
|
|
||
| // Nested scope — inner scope runs inside outer scope. | ||
| // After inner exits, outer state should still be "skip enabled". | ||
| // After outer exits, state should restore to "skip disabled". | ||
| $countOuter = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) { | ||
| // Inner scope: add dup + new | ||
| $countInner = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) { | ||
| return $database->createDocuments($collection, [ | ||
| $makeDoc('seed', 'Dup'), | ||
| $makeDoc('innerNew', 'InnerNew'), | ||
| ]); | ||
| }); | ||
| $this->assertSame(2, $countInner); | ||
|
|
||
| // Still inside outer scope — skip flag should still be on | ||
| return $database->createDocuments($collection, [ | ||
| $makeDoc('seed', 'Dup2'), | ||
| $makeDoc('outerNew', 'OuterNew'), | ||
| ]); | ||
| }); | ||
| $this->assertSame(2, $countOuter); | ||
|
|
||
| // After both scopes exit, skip flag is off again — a plain createDocuments | ||
| // call with a duplicate should throw. | ||
| $thrown = null; | ||
| try { | ||
| $database->createDocuments($collection, [$makeDoc('seed', 'ShouldThrow')]); | ||
| } catch (DuplicateException $e) { | ||
| $thrown = $e; | ||
| } | ||
| $this->assertNotNull($thrown, 'Plain createDocuments after nested scopes should throw on duplicate'); | ||
|
|
||
| // Final state: seed + innerNew + outerNew | ||
| $all = $database->find($collection); | ||
| $ids = \array_map(fn (Document $d) => $d->getId(), $all); | ||
| \sort($ids); | ||
| $this->assertSame(['innerNew', 'outerNew', 'seed'], $ids); | ||
| } | ||
|
|
||
| public function testCreateDocumentsSkipDuplicatesLargeBatch(): void | ||
| { | ||
| $database = $this->getDatabase(); | ||
|
|
||
| $collection = 'skipDupLarge'; | ||
| $database->createCollection($collection); | ||
| $database->createAttribute($collection, 'idx', Database::VAR_INTEGER, 0, true); | ||
|
|
||
| // Seed 50 docs | ||
| $seed = []; | ||
| for ($i = 0; $i < 50; $i++) { | ||
| $seed[] = new Document([ | ||
| '$id' => 'doc_' . $i, | ||
| 'idx' => $i, | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]); | ||
| } | ||
| $database->createDocuments($collection, $seed); | ||
|
|
||
| // Now call skipDuplicates with 300 docs: 50 existing (0-49) + 250 new (50-299). | ||
| // 300 > default INSERT_BATCH_SIZE, so this exercises the chunk loop. | ||
| $batch = []; | ||
| for ($i = 0; $i < 300; $i++) { | ||
| $batch[] = new Document([ | ||
| '$id' => 'doc_' . $i, | ||
| 'idx' => $i + 1000, // different value so we can detect if existing got overwritten | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]); | ||
| } | ||
|
|
||
| $emittedIds = []; | ||
| $count = $database->skipDuplicates(function () use ($database, $collection, $batch, &$emittedIds) { | ||
| return $database->createDocuments($collection, $batch, onNext: function (Document $doc) use (&$emittedIds) { | ||
| $emittedIds[] = $doc->getId(); | ||
| }); | ||
| }); | ||
|
|
||
| $this->assertSame(300, $count); | ||
| $this->assertCount(300, $emittedIds); | ||
|
|
||
| $seedDoc = $database->getDocument($collection, 'doc_25'); | ||
| $this->assertSame(25, $seedDoc->getAttribute('idx')); | ||
|
|
||
| $newDoc = $database->getDocument($collection, 'doc_100'); | ||
| $this->assertSame(1100, $newDoc->getAttribute('idx')); | ||
|
|
||
| $total = $database->count($collection); | ||
| $this->assertSame(300, $total); | ||
| } | ||
|
|
||
| public function testCreateDocumentsSkipDuplicatesSecondCallSkipsAll(): void | ||
| { | ||
| $database = $this->getDatabase(); | ||
|
|
||
| $collection = 'skipDupSecond'; | ||
| $database->createCollection($collection); | ||
| $database->createAttribute($collection, 'name', Database::VAR_STRING, 128, true); | ||
|
|
||
| $makeBatch = fn (string $name) => \array_map( | ||
| fn (string $id) => new Document([ | ||
| '$id' => $id, | ||
| 'name' => $name, | ||
| '$permissions' => [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| ], | ||
| ]), | ||
| ['a', 'b', 'c'] | ||
| ); | ||
|
|
||
| // First call — all new | ||
| $firstCount = $database->skipDuplicates( | ||
| fn () => $database->createDocuments($collection, $makeBatch('First')) | ||
| ); | ||
| $this->assertSame(3, $firstCount); | ||
|
|
||
| $emittedIds = []; | ||
| $secondCount = $database->skipDuplicates(function () use ($database, $collection, $makeBatch, &$emittedIds) { | ||
| return $database->createDocuments($collection, $makeBatch('Second'), onNext: function (Document $doc) use (&$emittedIds) { | ||
| $emittedIds[] = $doc->getId(); | ||
| }); | ||
| }); | ||
| $this->assertSame(3, $secondCount); | ||
| \sort($emittedIds); | ||
| $this->assertSame(['a', 'b', 'c'], $emittedIds); | ||
|
|
||
| // All three should retain the First values | ||
| foreach (['a', 'b', 'c'] as $id) { | ||
| $doc = $database->getDocument($collection, $id); | ||
| $this->assertSame('First', $doc->getAttribute('name'), "Doc {$id} should not have been overwritten"); | ||
| } | ||
| } | ||
|
|
||
| public function testCreateDocumentsSkipDuplicatesRelationships(): void | ||
| { | ||
| $database = $this->getDatabase(); | ||
|
|
||
| if (!$database->getAdapter()->getSupportForRelationships()) { | ||
| $this->expectNotToPerformAssertions(); | ||
| return; | ||
| } | ||
|
|
||
| $parent = 'skipDupParent'; | ||
| $child = 'skipDupChild'; | ||
| $permissions = [ | ||
| Permission::read(Role::any()), | ||
| Permission::create(Role::any()), | ||
| Permission::update(Role::any()), | ||
| Permission::delete(Role::any()), | ||
| ]; | ||
|
|
||
| $database->createCollection($parent); | ||
| $database->createCollection($child); | ||
| $database->createAttribute($parent, 'name', Database::VAR_STRING, 128, true); | ||
| $database->createAttribute($child, 'name', Database::VAR_STRING, 128, true); | ||
| $database->createRelationship( | ||
| collection: $parent, | ||
| relatedCollection: $child, | ||
| type: Database::RELATION_ONE_TO_MANY, | ||
| id: 'children', | ||
| ); | ||
|
|
||
| $database->createDocument($parent, new Document([ | ||
| '$id' => 'existingParent', | ||
| 'name' => 'ExistingParent', | ||
| '$permissions' => $permissions, | ||
| 'children' => [ | ||
| new Document([ | ||
| '$id' => 'existingChild', | ||
| 'name' => 'ExistingChild', | ||
| '$permissions' => $permissions, | ||
| ]), | ||
| ], | ||
| ])); | ||
|
|
||
| $batch = [ | ||
| new Document([ | ||
| '$id' => 'existingParent', | ||
| 'name' => 'ShouldNotOverwrite', | ||
| '$permissions' => $permissions, | ||
| 'children' => [ | ||
| new Document([ | ||
| '$id' => 'existingChild', | ||
| 'name' => 'ExistingChild', | ||
| '$permissions' => $permissions, | ||
| ]), | ||
| new Document([ | ||
| '$id' => 'retryChild', | ||
| 'name' => 'RetryChild', | ||
| '$permissions' => $permissions, | ||
| ]), | ||
| ], | ||
| ]), | ||
| new Document([ | ||
| '$id' => 'newParent', | ||
| 'name' => 'NewParent', | ||
| '$permissions' => $permissions, | ||
| 'children' => [ | ||
| new Document([ | ||
| '$id' => 'newChild', | ||
| 'name' => 'NewChild', | ||
| '$permissions' => $permissions, | ||
| ]), | ||
| ], | ||
| ]), | ||
| ]; | ||
|
|
||
| $database->skipDuplicates(fn () => $database->createDocuments($parent, $batch)); | ||
|
|
||
| $existing = $database->getDocument($parent, 'existingParent'); | ||
| $this->assertFalse($existing->isEmpty()); | ||
| $this->assertSame('ExistingParent', $existing->getAttribute('name')); | ||
|
|
||
| $existingChildren = $existing->getAttribute('children', []); | ||
| $childIds = \array_map(fn (Document $d) => $d->getId(), $existingChildren); | ||
| \sort($childIds); | ||
| $this->assertSame(['existingChild', 'retryChild'], $childIds); | ||
|
|
||
| $new = $database->getDocument($parent, 'newParent'); | ||
| $this->assertFalse($new->isEmpty()); | ||
| $this->assertSame('NewParent', $new->getAttribute('name')); | ||
| $newChildren = $new->getAttribute('children', []); | ||
| $this->assertCount(1, $newChildren); | ||
| $this->assertSame('newChild', $newChildren[0]->getId()); | ||
|
|
||
| $allChildren = $database->find($child); | ||
| $allChildIds = \array_map(fn (Document $d) => $d->getId(), $allChildren); | ||
| \sort($allChildIds); | ||
| $this->assertSame(['existingChild', 'newChild', 'retryChild'], $allChildIds); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find where DuplicateException and UniqueException are defined/used
rg "DuplicateException|UniqueException" --type php -l | head -20Repository: utopia-php/database
Length of output: 785
🏁 Script executed:
# Look at the Database class to understand the exception hierarchy
fd "Database\.php" -type f | grep -v node_modules | head -5Repository: utopia-php/database
Length of output: 233
🏁 Script executed:
# Examine the skipDuplicates implementation
rg "skipDuplicates" --type php -A 10 -B 2 | head -100Repository: utopia-php/database
Length of output: 7754
🏁 Script executed:
# Check if there's already a test for unique indexes with skipDuplicates
rg "unique.*skipDuplicates|skipDuplicates.*unique" --type php -iRepository: utopia-php/database
Length of output: 45
🏁 Script executed:
# Look at the skipDuplicates implementation in Database class
rg "function skipDuplicates" -A 30 src/Database/Database.phpRepository: utopia-php/database
Length of output: 978
🏁 Script executed:
# Find where DuplicateException is defined
rg "class DuplicateException" -A 5Repository: utopia-php/database
Length of output: 45
🏁 Script executed:
# Find where UniqueException is defined
rg "class UniqueException" -A 5Repository: utopia-php/database
Length of output: 45
🏁 Script executed:
# Look at how MariaDB adapter handles exceptions for unique violations
rg "DuplicateException|UniqueException" src/Database/Adapter/MariaDB.php -B 5 -A 2Repository: utopia-php/database
Length of output: 2264
🏁 Script executed:
# Check if there are any tests with unique indexes (not $id)
rg "createAttribute.*true" tests/e2e/Adapter/Scopes/DocumentTests.php | head -20Repository: utopia-php/database
Length of output: 2481
🏁 Script executed:
# Find where the skipDuplicates flag is checked in adapters
rg "skipDuplicates" src/Database/Adapter/ -B 3 -A 5Repository: utopia-php/database
Length of output: 5092
🏁 Script executed:
# Search for UniqueException usage in adapters
rg "UniqueException" src/Database/Adapter/Repository: utopia-php/database
Length of output: 45
🏁 Script executed:
# Look at SQL adapter exception handling for unique constraints
rg "23000.*1062" -A 10 -B 5 src/Database/Adapter/SQL.phpRepository: utopia-php/database
Length of output: 45
🏁 Script executed:
# Check the full exception handling code in MariaDB for 23000/1062
rg "1062" -B 10 -A 10 src/Database/Adapter/MariaDB.phpRepository: utopia-php/database
Length of output: 2150
🏁 Script executed:
# Look for how skipDuplicates handles exceptions during document creation
rg "skipDuplicates" src/Database/Adapter/ -A 15 | head -80Repository: utopia-php/database
Length of output: 4918
🏁 Script executed:
# Look at the full MariaDB exception mapping to understand the intent
rg "1062" -B 15 -A 5 src/Database/Adapter/MariaDB.php | head -60Repository: utopia-php/database
Length of output: 1860
🏁 Script executed:
# Check if there are tests with unique indexes on custom attributes
rg "createIndex" tests/e2e/Adapter/Scopes/DocumentTests.php -A 5 -B 2Repository: utopia-php/database
Length of output: 2706
🏁 Script executed:
# Look for tests that try to create duplicate custom unique attribute values
rg "unique.*attribute|attribute.*unique" tests/e2e/Adapter/Scopes/ -i | head -20Repository: utopia-php/database
Length of output: 1621
🏁 Script executed:
# Check if Postgres has different handling for skipDuplicates
rg "ON CONFLICT" src/Database/Adapter/Postgres.php -B 3 -A 3Repository: utopia-php/database
Length of output: 780
🏁 Script executed:
# Examine the Postgres suffix to see what's being targeted by ON CONFLICT DO NOTHING
cat src/Database/Adapter/Postgres.php | grep -A 15 "getInsertSuffix"Repository: utopia-php/database
Length of output: 490
Add test for custom unique index violations with skipDuplicates.
Current tests only verify behavior with $id duplicates. SQL adapters (SQL, SQLite) use blanket INSERT IGNORE that silences all unique constraint violations. Without a test covering custom unique indexes (e.g., a unique email attribute), a regression that suppresses unrelated unique-index conflicts would pass this test suite undetected. Add a case with createDocuments() + skipDuplicates() against a collection with a custom unique index and verify the intended exception behavior—either that custom unique violations still throw or that they're handled correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7859 - 8262, Add a
new test (e.g., testCreateDocumentsSkipDuplicatesCustomUniqueIndex) that creates
a collection, creates a unique attribute (like 'email' via
Database::createAttribute with the unique flag), seeds one document with an
email, then calls $database->skipDuplicates(...) to createDocuments with a
different $id but the same email and assert the intended behavior: either the
call throws DuplicateException or (if the adapter intentionally uses blanket
INSERT IGNORE) verify no overwrite occurred and only one row with that email
exists; use $database->createAttribute, $database->createDocuments,
$database->skipDuplicates, $database->getAdapter(), $database->getDocument and
$database->find to implement the assertions and branch expectations based on
adapter characteristics so the test fails if custom-unique violations are
silently suppressed incorrectly.
| // Nested scope — inner scope runs inside outer scope. | ||
| // After inner exits, outer state should still be "skip enabled". | ||
| // After outer exits, state should restore to "skip disabled". | ||
| $countOuter = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) { | ||
| // Inner scope: add dup + new | ||
| $countInner = $database->skipDuplicates(function () use ($database, $collection, $makeDoc) { | ||
| return $database->createDocuments($collection, [ | ||
| $makeDoc('seed', 'Dup'), | ||
| $makeDoc('innerNew', 'InnerNew'), | ||
| ]); | ||
| }); | ||
| $this->assertSame(2, $countInner); | ||
|
|
||
| // Still inside outer scope — skip flag should still be on | ||
| return $database->createDocuments($collection, [ | ||
| $makeDoc('seed', 'Dup2'), | ||
| $makeDoc('outerNew', 'OuterNew'), | ||
| ]); | ||
| }); | ||
| $this->assertSame(2, $countOuter); | ||
|
|
||
| // After both scopes exit, skip flag is off again — a plain createDocuments | ||
| // call with a duplicate should throw. | ||
| $thrown = null; | ||
| try { | ||
| $database->createDocuments($collection, [$makeDoc('seed', 'ShouldThrow')]); | ||
| } catch (DuplicateException $e) { | ||
| $thrown = $e; | ||
| } | ||
| $this->assertNotNull($thrown, 'Plain createDocuments after nested scopes should throw on duplicate'); | ||
|
|
There was a problem hiding this comment.
Cover the exceptional-exit path for the scope guard.
This only proves that nested skipDuplicates() restores state after normal returns. Since the feature is explicitly RAII-style, the risky failure mode is a callback throwing and leaking the flag into the next write. Please add a throwing inner/outer callback case and then assert a plain createDocuments() duplicate still raises afterward.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 8029 - 8059, Add a
test variant that exercises exceptional exits of the RAII-style scope guard:
call $database->skipDuplicates(...) with an inner or outer callback that throws
(use a deliberate throw or trigger DuplicateException) and ensure both the
thrown exception is caught in the test and that after the exception the skip
flag is restored by calling $database->createDocuments($collection,
[$makeDoc('seed','ShouldThrow')]) and asserting it throws a DuplicateException;
reference the existing nested test using skipDuplicates and createDocuments and
add analogous cases where the inner callback throws and where the outer callback
throws, then assert a plain createDocuments still raises DuplicateException
afterward.
Greptile SummaryThis PR adds a nestable
Confidence Score: 4/5Safe to merge for non-Mirror paths; Mirror's skipDuplicates re-fetch has a P1 correctness gap in shared-tables/tenantPerDocument mode that should be addressed before production use. All adapter-level implementations are clean and well-tested. The src/Database/Mirror.php — authoritative re-fetch in skipDuplicates path needs per-tenant grouping and authorization->skip(). Important Files Changed
|
| $authoritative = $this->source->silent( | ||
| fn () => $this->source->find($collection, [ | ||
| Query::equal('$id', $ids), | ||
| Query::limit(\count($ids)), | ||
| ]) | ||
| ); |
There was a problem hiding this comment.
Missing per-tenant grouping in authoritative re-fetch
In sharedTables + tenantPerDocument mode, this find() runs under the session tenant context (which may be null for platform workers) and silently misses documents belonging to other tenants in a cross-tenant batch — exactly the same gap that was fixed for upsertDocumentsWithIncrease in this same PR. Those documents are never forwarded to the destination, causing silent divergence.
The same per-tenant grouping pattern used in upsertDocumentsWithIncrease should be applied here:
if ($this->source->getSharedTables() && $this->source->getAdapter()->getTenantPerDocument()) {
$idsByTenant = [];
foreach ($documents as $d) {
if ($d->getId() !== '') {
$idsByTenant[$d->getTenant()][] = $d->getId();
}
}
$authoritative = [];
foreach ($idsByTenant as $tenant => $tenantIds) {
$tenantIds = array_values(array_unique($tenantIds));
$found = $this->source->authorization->skip(fn () =>
$this->source->withTenant($tenant, fn () =>
$this->source->silent(fn () => $this->source->find($collection, [
Query::equal('$id', $tenantIds),
Query::limit(count($tenantIds)),
]))
)
);
array_push($authoritative, ...$found);
}
} else {
$authoritative = $this->source->silent(fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(count($ids)),
]));
}|
|
||
| $authoritative = $this->source->silent( | ||
| fn () => $this->source->find($collection, [ | ||
| Query::equal('$id', $ids), | ||
| Query::limit(\count($ids)), | ||
| ]) | ||
| ); |
There was a problem hiding this comment.
Missing
authorization->skip() on authoritative re-fetch
upsertDocumentsWithIncrease wraps its equivalent find() in $this->authorization->skip(...), so it reads back all documents regardless of the caller's auth context. This re-fetch omits that guard. If Mirror is invoked under a restricted authorization context (not globally skipped), documents that were just written but aren't readable under the current session's permissions will be silently excluded, and the destination will receive an incomplete set.
$authoritative = $this->source->authorization->skip(
fn () => $this->source->silent(
fn () => $this->source->find($collection, [
Query::equal('$id', $ids),
Query::limit(\count($ids)),
])
)
);
Summary
Adds a RAII-style
skipDuplicates()scope guard for batch inserts, silencing duplicate-key errors at the adapter layer via dialect-specific no-op inserts. Primary use cases: idempotent CSV re-imports and database migration re-runs.Adapter implementation
INSERT IGNORE INTOINSERT INTO ... ON CONFLICT (...) DO NOTHINGINSERT OR IGNORE INTOupsert + $setOnInsert, bypassing the transaction wrap to avoid the txn-abort-on-duplicate behavior ofinsertManyHot-path cost
One property read + one boolean branch per chunk. Callers that don't use
skipDuplicatespay zero closure allocation and zero scope-guard setup. Per-chunk overhead only applies when the feature is actually used.Mirror dual-write correctness
Mirror::createDocumentsinskipDuplicatesmode now re-fetches source's authoritative state viafind()after the adapter insert settles, then forwards that to destination instead of the caller's input. Race-free regardless of concurrent writers — destination always receives a faithful snapshot of whatever source holds.This replaces a prior
captureOnNext-based approach that forwarded would-be values for source-skipped duplicates, diverging destination from source. Addresses Greptile flag #3084293974.Cost: one extra
SELECTagainst source per batch whenskipDuplicatesis active and Mirror has an upgraded destination. Zero cost on the non-skip path.upsertDocumentsWithIncreasetenant groupingRestored per-tenant grouping for the existing-doc lookup in
sharedTables + tenantPerDocumentmode. The previously-inlinedfind()ran under the session tenant context (nullfor platform workers) and silently missed rows from other tenants, breaking the appwrite StatsUsage worker that flushes stats across projects.Per-tenant grouping uses K queries (K = unique tenants in the batch) instead of N (per-doc) or 1 (broken). The common single-tenant path still hits the fast batched code.
Test coverage
Test plan
MirrorTest— 635 tests, 11035 assertions greenPostgresTest,SQLiteTest,MariaDBTest,MySQLTest,MongoDBTestskipDuplicates suites greencreateDocumentstest sweep on Postgres + SQLiteReplaces
Replaces #852, which accumulated 91 commits of mid-review churn including merges from other PRs already in main. This branch is a clean squash of just the feature work (10 files, +789 / -17) against current
main.Summary by CodeRabbit
Release Notes
skipDuplicates()functionality enabling bulk document creation to silently skip duplicate entries instead of throwing errors