Skip to content

feat(core): add VECTOR datatype support with Oracle vector integration#2

Open
prajalg wants to merge 52 commits into
mainfrom
vector_temp
Open

feat(core): add VECTOR datatype support with Oracle vector integration#2
prajalg wants to merge 52 commits into
mainfrom
vector_temp

Conversation

@prajalg

@prajalg prajalg commented May 29, 2026

Copy link
Copy Markdown
Owner

Pull Request Checklist

  • Have you added new tests to prevent regressions?
  • If a documentation update is necessary, have you opened a PR to the documentation repository?
  • Did you update the typescript typings accordingly (if applicable)?
  • Does the description below contain a link to an existing issue or a description of the issue you are solving?
  • Does the name of your PR follow our conventions?

Description of Changes

This PR adds VECTOR support with Oracle-first implementation, shared core abstractions, and full validation/test coverage for vector SQL generation paths.

Core datatype & API groundwork

  • Added shared vector datatype foundation in core:
    • AbstractVECTORBase (shared behavior/validation)
    • generic DataTypes.VECTOR constructor overloads:
      • DataTypes.VECTOR
      • DataTypes.VECTOR(dimension)
      • DataTypes.VECTOR(dimension, format)
      • DataTypes.VECTOR({ dimension, format })
  • Added vector value validation for arrays and typed arrays.
  • Added dialect SQL rendering hooks for vector types.
  • Added Sequelize vector helper methods:
    • cosineDistance
    • innerProduct
    • l1Distance
    • l2Distance
    • vectorDistance
  • Updated typings so vector datatype + helper APIs are typed consistently.

Oracle integration

  • Enabled supports.dataTypes.VECTOR = true.
  • Added Oracle vector datatype override with Oracle-specific option handling and validation, including Oracle storage options (including SPARSE) and bind marshalling.
  • Extended Oracle query-generator/vector function mapping so Sequelize vector helpers resolve to Oracle-native SQL.
  • Added normalization/validation for vector-function arguments and supported input shapes.
  • Added Oracle vector index SQL generation and validations for:
    • using
    • distance
    • accuracy
    • parameters
    • per-field order
  • Hardened invalid/error-path handling (including unsafe/injection-like option fragments) with explicit allowlists.

Tests

  • Added/updated unit and integration tests for:
    • vector datatype constructor/rendering/validation paths
    • Oracle vector SQL generation paths
    • error paths for invalid/unsafe vector options
    • Oracle show-index compatibility behavior across version expectations

Guidance for other dialects

For future dialect integration, this PR establishes the recommended path:

  1. Enable supports.dataTypes.VECTOR.
  2. Reuse VECTOR where generic option shape is sufficient, or extend AbstractVECTORBase for dialect-specific option models.
  3. Implement dialect SQL rendering + bind handling.
  4. Map vector helper functions in query-generator.
  5. Add dialect-specific index/query support and validation.
  6. Mirror unit/integration coverage.

List of Breaking Changes

  • None.

Summary by CodeRabbit

  • New Features

    • Added VECTOR data type for storing/querying vector embeddings.
    • Added vector distance/similarity helpers (cosine, inner product, L1, L2, generic vector distance) usable in queries and ordering.
    • Added vector index support to optimize similarity searches.
    • Added support for multiple vector formats and input types, with validation and literal handling.
  • Tests

    • Comprehensive unit and integration tests (including Oracle) covering persistence, queries, indexes, and validation.

Review Change Stack

prajalg and others added 30 commits March 20, 2026 15:30
…etc.

validate using to only allow hnsw or ivf
allowlist distance metrics before emitting WITH DISTANCE
require numeric, finite, positive values for accuracy and vector index parameters
quote dropConstraintQuery constraint names
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 80c82464-485c-43bb-adf9-81d8081919c6

📥 Commits

Reviewing files that changed from the base of the PR and between 9f1579e and f42eebe.

📒 Files selected for processing (1)
  • packages/core/test/integration/dialects/oracle/vector.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/test/integration/dialects/oracle/vector.test.js

📝 Walkthrough

Walkthrough

This PR adds VECTOR data type support: core types and abstract base, Sequelize vector helper methods, Oracle dialect VECTOR implementation and index DDL/formatting, conditional index metadata selection, and extensive unit and integration tests.

Changes

Vector Data Type and Oracle Integration

Layer / File(s) Summary
Core VECTOR data type definition and dialect support
packages/core/src/abstract-dialect/data-types.ts, packages/core/src/abstract-dialect/dialect.ts, packages/core/src/abstract-dialect/query-interface.d.ts, packages/core/src/data-types.ts
Introduces VectorOptions, VectorValue, and NumericTypedArray types alongside AbstractVECTORBase and VECTOR classes. Updates dialect support matrix to include VECTOR capability flag and add VECTOR to allowed index types.
Sequelize vector helper methods
packages/core/src/sequelize.d.ts, packages/core/src/sequelize.js
Adds five Sequelize instance methods (cosineDistance, innerProduct, l1Distance, l2Distance, vectorDistance) that generate dialect-specific vector distance expressions via shared #vectorSimilarityFn helper with runtime support validation.
Oracle VECTOR data type implementation
packages/oracle/src/_internal/data-types-overrides.ts, packages/oracle/src/dialect.ts
Defines Oracle-specific vector options (OracleVectorFormat, OracleVectorStorage, OracleVectorOptions), sparse vector input shapes, and VECTOR class extending AbstractVECTORBase. Converts dense arrays to Float64Array and sets Oracle bind metadata. Enables VECTOR support in Oracle dialect.
Oracle query generation for vector functions and indexes
packages/oracle/src/query-generator.internal.ts, packages/oracle/src/query-generator.js, packages/oracle/src/query-generator-typescript.internal.ts, packages/oracle/src/query.js
Implements formatFn override for vector function SQL generation with literal validation and element-level finitude checks. Adds showIndexesQuery index subtype metadata enrichment and conditional INDEX_SUBTYPE selection based on database version. Extends addIndexQuery to support VECTOR indexes with parameter normalization, distance metric validation, and field constraint enforcement.
Unit tests for VECTOR types and Oracle query generation
packages/core/test/unit/data-types/string-types.test.ts, packages/core/test/unit/dialects/oracle/query-generator.test.js, packages/core/test/unit/dialects/oracle/vector.test.js, packages/core/test/unit/query-generator/show-indexes-query.test.ts, packages/core/test/unit/sequelize.test.ts
Validates VECTOR constructor forms, toSql output per dialect, validate input handling, Oracle DDL and index SQL generation, metadata mapping for HNSW/IVF methods, datatype binding conversion to Float64Array, vector distance WHERE clause formatting, and Sequelize helper function generation.
Oracle integration tests for vector persistence and search
packages/core/test/integration/dialects/oracle/vector.test.js
End-to-end tests for vector column persistence, similarity searches using distance functions in where and order clauses, input validation for vector arguments and updates, binary vector format support, and vector index creation during model sync with error handling.
Export and compatibility updates
test/esm-named-exports.test.js
Updates ESM export compatibility test to ignore the VECTOR key when comparing CJS and ESM exports.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Sequelize
  participant Oracle
  participant DB
  User->>Sequelize: sequelize.vectorDistance(column, value)
  Sequelize->>Sequelize: check dialect.supports.dataTypes.VECTOR
  Sequelize->>Oracle: formatFn(VECTOR_DISTANCE(column, VECTOR(...)))
  Oracle->>Oracle: validate args, format column, build VECTOR literal
  Oracle->>DB: execute SELECT ... WHERE VECTOR_DISTANCE(...)
  DB-->>Sequelize: rows
  Sequelize-->>User: result set
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I nibble code in quiet night,
Vectors bloom in SQL light.
Cosine sings and distances run,
Oracle indexes join the fun.
Hops of joy — the tests pass bright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding VECTOR datatype support with Oracle integration as a core feature.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vector_temp

Comment @coderabbitai help to get the list of available commands and usage tips.

@prajalg prajalg changed the title Vector temp feat(core): add VECTOR datatype support with Oracle vector integration May 29, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/abstract-dialect/data-types.ts`:
- Around line 2916-2929: The validate method currently only checks container
shape via _getVectorIterable; update validate (and the parallel block at
2934-2942) so that after obtaining the iterable from _getVectorIterable(value)
you iterate over its elements and call _validateVectorElement for each item, and
if any element fails throw via ValidationErrorItem.throwDataTypeValidationError
(keeping the existing util.format('%O is not a valid vector', value) or similar
contextual message); ensure this works for both plain arrays and typed-array
iterables returned by _getVectorIterable and that _validateVectorElement
enforces numeric finite values.

In `@packages/core/test/integration/dialects/oracle/vector.test.js`:
- Line 141: The test currently hardcodes the table name "Items" in the raw SQL
(`SELECT "id" FROM "Items" WHERE VECTOR_DISTANCE("embeddings", $queryVector) <
$threshold ORDER BY "id"`), which is brittle; replace that literal with the
model-derived table name (e.g., call Model.getTableName() or the test's
modelInstance.getTableName()) when building the SQL string so the test uses the
resolved table metadata (keep the rest of the VECTOR_DISTANCE clause and
parameter placeholders unchanged).

In `@packages/oracle/src/_internal/data-types-overrides.ts`:
- Around line 593-608: The sparse vector type guard is too permissive: update
isOracleSparseVectorInput (and the analogous guard at 616-637) to validate that
indices are either a Uint32Array or an Array of integers, reject wider typed
arrays, and that each index is an integer, >= 0 and < numDimensions; also
validate values elements are finite numbers (use Number.isFinite) and keep the
length equality check. Concretely, inside isOracleSparseVectorInput iterate over
indices and values, enforce Number.isInteger(idx) && idx >= 0 && idx <
numDimensions for each index, ensure indices is instance of Uint32Array or plain
Array (not Float64Array/Int32Array/etc.), and enforce Number.isFinite(val) for
each value; keep existing numDimensions integer/positive check and values.length
=== indices.length as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ffee8fef-2d0f-4f14-a917-1e2dcc34e84e

📥 Commits

Reviewing files that changed from the base of the PR and between 8260c29 and 9f1579e.

📒 Files selected for processing (19)
  • packages/core/src/abstract-dialect/data-types.ts
  • packages/core/src/abstract-dialect/dialect.ts
  • packages/core/src/abstract-dialect/query-interface.d.ts
  • packages/core/src/data-types.ts
  • packages/core/src/sequelize.d.ts
  • packages/core/src/sequelize.js
  • packages/core/test/integration/dialects/oracle/vector.test.js
  • packages/core/test/unit/data-types/string-types.test.ts
  • packages/core/test/unit/dialects/oracle/query-generator.test.js
  • packages/core/test/unit/dialects/oracle/vector.test.js
  • packages/core/test/unit/query-generator/show-indexes-query.test.ts
  • packages/core/test/unit/sequelize.test.ts
  • packages/oracle/src/_internal/data-types-overrides.ts
  • packages/oracle/src/dialect.ts
  • packages/oracle/src/query-generator-typescript.internal.ts
  • packages/oracle/src/query-generator.internal.ts
  • packages/oracle/src/query-generator.js
  • packages/oracle/src/query.js
  • test/esm-named-exports.test.js

Comment thread packages/core/src/abstract-dialect/data-types.ts
Comment thread packages/core/test/integration/dialects/oracle/vector.test.js Outdated
Comment on lines +593 to +608
function isOracleSparseVectorInput(value: unknown): value is OracleSparseVectorInput {
if (typeof value !== 'object' || value === null) {
return false;
}

const sparseVector = value as Partial<OracleSparseVectorInput>;
const { indices, numDimensions, values } = sparseVector;

return (
isVectorComponent(values) &&
isVectorComponent(indices) &&
typeof numDimensions === 'number' &&
Number.isInteger(numDimensions) &&
numDimensions > 0 &&
values.length === indices.length
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden sparse vector validation for indices and numeric values.

The sparse guard currently checks shape but not semantics: indices are not constrained to integers/range, and values are not checked for finite numbers. It also accepts index typed arrays wider than the declared Uint32Array contract.

Suggested fix
 function isOracleSparseVectorInput(value: unknown): value is OracleSparseVectorInput {
   if (typeof value !== 'object' || value === null) {
     return false;
   }

   const sparseVector = value as Partial<OracleSparseVectorInput>;
   const { indices, numDimensions, values } = sparseVector;
+  if (!(typeof numDimensions === 'number' && Number.isInteger(numDimensions) && numDimensions > 0)) {
+    return false;
+  }
+
+  const hasValidValues =
+    isVectorComponent(values) &&
+    Array.from(values).every(item => typeof item === 'number' && Number.isFinite(item));
+
+  const hasValidIndices =
+    (Array.isArray(indices) || indices instanceof Uint32Array) &&
+    Array.from(indices).every(index => Number.isInteger(index) && index >= 0 && index < numDimensions);

   return (
-    isVectorComponent(values) &&
-    isVectorComponent(indices) &&
-    typeof numDimensions === 'number' &&
-    Number.isInteger(numDimensions) &&
-    numDimensions > 0 &&
+    hasValidValues &&
+    hasValidIndices &&
     values.length === indices.length
   );
 }

Also applies to: 616-637

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/oracle/src/_internal/data-types-overrides.ts` around lines 593 -
608, The sparse vector type guard is too permissive: update
isOracleSparseVectorInput (and the analogous guard at 616-637) to validate that
indices are either a Uint32Array or an Array of integers, reject wider typed
arrays, and that each index is an integer, >= 0 and < numDimensions; also
validate values elements are finite numbers (use Number.isFinite) and keep the
length equality check. Concretely, inside isOracleSparseVectorInput iterate over
indices and values, enforce Number.isInteger(idx) && idx >= 0 && idx <
numDimensions for each index, ensure indices is instance of Uint32Array or plain
Array (not Float64Array/Int32Array/etc.), and enforce Number.isFinite(val) for
each value; keep existing numDimensions integer/positive check and values.length
=== indices.length as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants