Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .changeset/fix-canonicalize-in-values.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
'@tanstack/db': patch
---

fix(db): canonicalize `inArray` value order in `normalizeExpressionPaths`

`inArray` is set membership, but `normalizeExpressionPaths` only canonicalized
ref paths, not value order — so the same value set in a different order produced
a distinct serialized predicate (and `loadSubset` queryKey / cache key for
adapters that key by it) and refetched identical data. `normalizeExpressionPaths`
now also sorts `in` value arrays, and the join lazy-load predicate (previously
built and combined without normalization) is routed through it so its keys are
canonicalized too.
25 changes: 18 additions & 7 deletions packages/db/src/query/compiler/expressions.ts
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
import { Func, PropRef, Value } from '../ir.js'
import { defaultComparator } from '../../utils/comparison.js'
import type { BasicExpression, OrderBy } from '../ir.js'

/**
* Normalizes a WHERE clause expression by removing table aliases from property references.
* Normalizes a WHERE clause expression into a canonical form.
*
* This function recursively traverses an expression tree and creates new BasicExpression
* instances with normalized paths. The main transformation is removing the collection alias
* from property reference paths (e.g., `['user', 'id']` becomes `['id']` when `collectionAlias`
* is `'user'`), which is needed when converting query-level expressions to collection-level
* expressions for subscriptions.
* Recursively traverses an expression tree and creates new BasicExpression instances with:
* - **Normalized paths** — the collection alias is removed from property reference paths
* (e.g. `['user', 'id']` becomes `['id']` when `collectionAlias` is `'user'`), needed when
* converting query-level expressions to collection-level expressions for subscriptions.
* - **Canonical set-membership order** — `in` value arrays are sorted. `in` is unordered, so
* without this the same value set in a different order produces a distinct serialized
* predicate (and `loadSubset` queryKey / cache key) and refetches identical data.
*
* @param whereClause - The WHERE clause expression to normalize
* @param collectionAlias - The alias of the collection being filtered (to strip from paths)
* @returns A new BasicExpression with normalized paths
* @returns A new, canonicalized BasicExpression
*
* @example
* // Input: ref with path ['user', 'id'] where collectionAlias is 'user'
Expand Down Expand Up @@ -48,6 +51,14 @@ export function normalizeExpressionPaths(
)
args.push(convertedArg)
}
if (
whereClause.name === `in` &&
args.length === 2 &&
args[1]?.type === `val` &&
Array.isArray(args[1].value)
) {
args[1] = new Value([...args[1].value].sort(defaultComparator))
}
Comment on lines +54 to +61
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Default .sort() breaks canonicalization for multi-digit numbers.

JavaScript's default .sort() uses lexicographic (string) comparison, which produces incorrect ordering for numbers. For example, [10, 2, 1].sort() yields [1, 10, 2] instead of [1, 2, 10]. Since inArray is commonly used with numeric IDs and foreign keys, two semantically identical predicates inArray(id, [1, 2, 10]) and inArray(id, [1, 10, 2]) would canonicalize to different orderings, defeating the entire purpose of this PR (cache hits, deduplication).

🔧 Proposed fix with type-aware comparison
  if (
    whereClause.name === `in` &&
    args.length === 2 &&
    args[1]?.type === `val` &&
    Array.isArray(args[1].value)
  ) {
-   args[1] = new Value([...args[1].value].sort())
+   args[1] = new Value([...args[1].value].sort((a, b) => {
+     // Handle primitive types with type-aware comparison
+     if (typeof a === 'number' && typeof b === 'number') {
+       return a - b
+     }
+     if (typeof a === 'string' && typeof b === 'string') {
+       return a.localeCompare(b)
+     }
+     // Fallback: stringify for comparison (matches serialization behavior)
+     const aStr = String(a)
+     const bStr = String(b)
+     return aStr < bStr ? -1 : aStr > bStr ? 1 : 0
+   }))
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/query/compiler/expressions.ts` around lines 53 - 60, The
current canonicalization uses Array.prototype.sort() which sorts
lexicographically and breaks numeric ordering; update the branch that checks
whereClause.name === `in` to sort the args[1].value with a type-aware comparator
before wrapping in new Value: if all items are numbers sort numerically (a - b),
if all are strings sort lexically (localeCompare), otherwise convert items to a
consistent string representation and sort deterministically; use a copied array
(e.g., [...args[1].value]) and then pass the sorted array into new Value(...) so
the canonical form for inArray (handled whereClause.name/args and Value) is
stable for numeric IDs and mixed types.

return new Func(whereClause.name, args)
}
}
Expand Down
6 changes: 5 additions & 1 deletion packages/db/src/query/compiler/joins.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import { ensureIndexForField } from '../../indexes/auto-index.js'
import { PropRef } from '../ir.js'
import { inArray } from '../builder/functions.js'
import { compileExpression } from './evaluators.js'
import { normalizeExpressionPaths } from './expressions.js'
import { getLazyLoadTargets } from './lazy-targets.js'
import type { CompileQueryFn } from './index.js'
import type { OrderByOptimizationInfo } from './order-by.js'
Expand Down Expand Up @@ -310,7 +311,10 @@ function processJoin(

const lazyJoinRef = new PropRef(target.path)
const loaded = lazySourceSubscription.requestSnapshot({
where: inArray(lazyJoinRef, joinKeys),
where: normalizeExpressionPaths(
inArray(lazyJoinRef, joinKeys),
target.alias,
),
optimizedOnly: true,
})

Expand Down
116 changes: 116 additions & 0 deletions packages/db/tests/query/canonicalize-in-values.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
import { describe, expect, it, vi } from 'vitest'
import { createLiveQueryCollection, inArray } from '../../src/query/index.js'
import { createCollection } from '../../src/collection/index.js'
import { BasicIndex } from '../../src/indexes/basic-index.js'
import { extractSimpleComparisons } from '../../src/query/expression-helpers.js'
import type { LoadSubsetOptions } from '../../src/types.js'

type Child = { id: number; parentId: number; title: string }

const sampleChildren: Array<Child> = [
{ id: 10, parentId: 1, title: `Child A1` },
{ id: 20, parentId: 2, title: `Child B1` },
{ id: 30, parentId: 3, title: `Child C1` },
]

function createChildrenCollectionWithTracking() {
const loadSubsetCalls: Array<LoadSubsetOptions> = []

const collection = createCollection<Child>({
id: `canon-children`,
getKey: (child) => child.id,
syncMode: `on-demand`,
autoIndex: `eager`,
defaultIndexType: BasicIndex,
sync: {
sync: ({ begin, write, commit, markReady }) => {
begin()
for (const child of sampleChildren) {
write({ type: `insert`, value: child })
}
commit()
markReady()
return {
loadSubset: vi.fn((options: LoadSubsetOptions) => {
loadSubsetCalls.push(options)
return Promise.resolve()
}),
}
},
},
})

return { collection, loadSubsetCalls }
}

function firstInArrayValues(loadSubsetCalls: Array<LoadSubsetOptions>) {
expect(loadSubsetCalls.length).toBeGreaterThan(0)
const filters = extractSimpleComparisons(loadSubsetCalls[0]!.where)
const inFilter = filters.find((f) => f.operator === `in`)
expect(inFilter).toBeDefined()
return inFilter!.value
}

describe(`canonicalize inArray value order`, () => {
it(`sorts inArray values in the predicate passed to loadSubset`, async () => {
const { collection: children, loadSubsetCalls } =
createChildrenCollectionWithTracking()

// Out-of-order literal — survives normalizeExpressionPaths unsorted, so
// without canonicalization loadSubset would receive [3, 1, 2].
const liveQuery = createLiveQueryCollection((q) =>
q.from({ c: children }).where(({ c }) => inArray(c.parentId, [3, 1, 2])),
)

await liveQuery.preload()

expect(firstInArrayValues(loadSubsetCalls)).toEqual([1, 2, 3])
})

it(`sorts multi-digit numbers numerically, not lexicographically`, async () => {
const { collection: children, loadSubsetCalls } =
createChildrenCollectionWithTracking()

// A lexicographic `.sort()` would yield [1, 10, 2]; the value comparator
// must sort these numerically to [1, 2, 10].
const liveQuery = createLiveQueryCollection((q) =>
q.from({ c: children }).where(({ c }) => inArray(c.parentId, [10, 2, 1])),
)

await liveQuery.preload()

expect(firstInArrayValues(loadSubsetCalls)).toEqual([1, 2, 10])
})

it(`leaves a single-element inArray unchanged`, async () => {
const { collection: children, loadSubsetCalls } =
createChildrenCollectionWithTracking()

const liveQuery = createLiveQueryCollection((q) =>
q.from({ c: children }).where(({ c }) => inArray(c.parentId, [7])),
)

await liveQuery.preload()

expect(firstInArrayValues(loadSubsetCalls)).toEqual([7])
})
Comment on lines +55 to +96
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Test data doesn't catch multi-digit number sorting bug.

The test uses single-digit values [3, 1, 2], which sort correctly even with JavaScript's default lexicographic sort. However, the implementation in expressions.ts uses .sort() without a comparator, which fails for multi-digit numbers (e.g., [10, 2, 1] would sort to [1, 10, 2] instead of [1, 2, 10]). Add a test case with multi-digit numeric IDs to catch this critical bug.

🧪 Proposed additional test case
+ it(`sorts multi-digit numeric inArray values correctly`, async () => {
+   const { collection: children, loadSubsetCalls } =
+     createChildrenCollectionWithTracking()
+
+   // Multi-digit numbers expose lexicographic vs numeric sort bugs
+   const liveQuery = createLiveQueryCollection((q) =>
+     q.from({ c: children }).where(({ c }) => inArray(c.parentId, [100, 20, 3])),
+   )
+
+   await liveQuery.preload()
+
+   expect(loadSubsetCalls.length).toBeGreaterThan(0)
+
+   const filters = extractSimpleComparisons(loadSubsetCalls[0]!.where)
+   const inFilter = filters.find((f) => f.operator === `in`)
+
+   expect(inFilter).toBeDefined()
+   expect(inFilter!.value).toEqual([3, 20, 100])
+ })

As per coding guidelines: "Test corner cases including: empty arrays/sets, single-element collections, undefined vs null values, resolved promises, async race conditions, and limit/offset edge cases."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it(`sorts inArray values in the predicate passed to loadSubset`, async () => {
const { collection: children, loadSubsetCalls } =
createChildrenCollectionWithTracking()
// Out-of-order literal — survives normalizeExpressionPaths unsorted, so
// without canonicalization loadSubset would receive [3, 1, 2].
const liveQuery = createLiveQueryCollection((q) =>
q.from({ c: children }).where(({ c }) => inArray(c.parentId, [3, 1, 2])),
)
await liveQuery.preload()
expect(loadSubsetCalls.length).toBeGreaterThan(0)
const filters = extractSimpleComparisons(loadSubsetCalls[0]!.where)
const inFilter = filters.find((f) => f.operator === `in`)
expect(inFilter).toBeDefined()
expect(inFilter!.value).toEqual([1, 2, 3])
})
it(`sorts inArray values in the predicate passed to loadSubset`, async () => {
const { collection: children, loadSubsetCalls } =
createChildrenCollectionWithTracking()
// Out-of-order literal — survives normalizeExpressionPaths unsorted, so
// without canonicalization loadSubset would receive [3, 1, 2].
const liveQuery = createLiveQueryCollection((q) =>
q.from({ c: children }).where(({ c }) => inArray(c.parentId, [3, 1, 2])),
)
await liveQuery.preload()
expect(loadSubsetCalls.length).toBeGreaterThan(0)
const filters = extractSimpleComparisons(loadSubsetCalls[0]!.where)
const inFilter = filters.find((f) => f.operator === `in`)
expect(inFilter).toBeDefined()
expect(inFilter!.value).toEqual([1, 2, 3])
})
it(`sorts multi-digit numeric inArray values correctly`, async () => {
const { collection: children, loadSubsetCalls } =
createChildrenCollectionWithTracking()
// Multi-digit numbers expose lexicographic vs numeric sort bugs
const liveQuery = createLiveQueryCollection((q) =>
q.from({ c: children }).where(({ c }) => inArray(c.parentId, [100, 20, 3])),
)
await liveQuery.preload()
expect(loadSubsetCalls.length).toBeGreaterThan(0)
const filters = extractSimpleComparisons(loadSubsetCalls[0]!.where)
const inFilter = filters.find((f) => f.operator === `in`)
expect(inFilter).toBeDefined()
expect(inFilter!.value).toEqual([3, 20, 100])
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/tests/query/canonicalize-in-values.test.ts` around lines 47 - 66,
The test currently only uses single-digit values so it misses a numeric-sorting
bug; add a second assertion case that calls createLiveQueryCollection(... where
inArray(c.parentId, [10, 2, 1])) (or similar multi-digit set like [12,3,4]),
preload it, extract the in-filter from loadSubsetCalls (using
extractSimpleComparisons and find where operator === 'in') and assert that
inFilter!.value equals the numerically sorted array [1,2,10] (or [3,4,12]); this
will catch the bug in expressions.ts where .sort() is used without a numeric
comparator while still using the existing helpers loadSubset,
createLiveQueryCollection, and inArray.


it(`sorts inArray values for ordered/limited queries too`, async () => {
const { collection: children, loadSubsetCalls } =
createChildrenCollectionWithTracking()

// orderBy + limit takes the requestLimitedSnapshot path; the where is still
// canonicalized because it is normalized at subscription creation.
const liveQuery = createLiveQueryCollection((q) =>
q
.from({ c: children })
.where(({ c }) => inArray(c.parentId, [3, 1, 2]))
.orderBy(({ c }) => c.id)
.limit(2),
)

await liveQuery.preload()

expect(firstInArrayValues(loadSubsetCalls)).toEqual([1, 2, 3])
})
})