Skip to content

feat(core): add VECTOR datatype support with Oracle vector integration#1

Open
prajalg wants to merge 52 commits into
mainfrom
vector_support_v7
Open

feat(core): add VECTOR datatype support with Oracle vector integration#1
prajalg wants to merge 52 commits into
mainfrom
vector_support_v7

Conversation

@prajalg

@prajalg prajalg commented May 22, 2026

Copy link
Copy Markdown
Owner

Pull Request Checklist

  • Have you added new tests to prevent regressions?
  • If a documentation update is necessary, have you opened a PR to the documentation repository?
  • Did you update the typescript typings accordingly (if applicable)?
  • Does the description below contain a link to an existing issue or a description of the issue you are solving?
  • Does the name of your PR follow our conventions?

Description of Changes

This PR adds VECTOR support with Oracle-first implementation, shared core abstractions, and full validation/test coverage for vector SQL generation paths.

Core datatype & API groundwork

  • Added shared vector datatype foundation in core:
    • AbstractVECTORBase (shared behavior/validation)
    • generic DataTypes.VECTOR constructor overloads:
      • DataTypes.VECTOR
      • DataTypes.VECTOR(dimension)
      • DataTypes.VECTOR(dimension, format)
      • DataTypes.VECTOR({ dimension, format })
  • Added vector value validation for arrays and typed arrays.
  • Added dialect SQL rendering hooks for vector types.
  • Added Sequelize vector helper methods:
    • cosineDistance
    • innerProduct
    • l1Distance
    • l2Distance
    • vectorDistance
  • Updated typings so vector datatype + helper APIs are typed consistently.

Oracle integration

  • Enabled supports.dataTypes.VECTOR = true.
  • Added Oracle vector datatype override with Oracle-specific option handling and validation, including Oracle storage options (including SPARSE) and bind marshalling.
  • Extended Oracle query-generator/vector function mapping so Sequelize vector helpers resolve to Oracle-native SQL.
  • Added normalization/validation for vector-function arguments and supported input shapes.
  • Added Oracle vector index SQL generation and validations for:
    • using
    • distance
    • accuracy
    • parameters
    • per-field order
  • Hardened invalid/error-path handling (including unsafe/injection-like option fragments) with explicit allowlists.

Tests

  • Added/updated unit and integration tests for:
    • vector datatype constructor/rendering/validation paths
    • Oracle vector SQL generation paths
    • error paths for invalid/unsafe vector options
    • Oracle show-index compatibility behavior across version expectations

Guidance for other dialects

For future dialect integration, this PR establishes the recommended path:

  1. Enable supports.dataTypes.VECTOR.
  2. Reuse VECTOR where generic option shape is sufficient, or extend AbstractVECTORBase for dialect-specific option models.
  3. Implement dialect SQL rendering + bind handling.
  4. Map vector helper functions in query-generator.
  5. Add dialect-specific index/query support and validation.
  6. Mirror unit/integration coverage.

List of Breaking Changes

  • None.

Summary by CodeRabbit

  • New Features

    • Added VECTOR data type for embeddings with configurable dimension/format and dialect support
    • Added vector similarity helpers: cosineDistance, innerProduct, l1Distance, l2Distance, vectorDistance
    • Added VECTOR index support (HNSW & IVF) and index creation/validation
  • Enhancements

    • Support for numeric typed arrays, binary formats, and sparse-vector inputs
    • Oracle dialect: VECTOR SQL formatting, bind handling, and version-aware index metadata
  • Tests

    • Extensive unit and integration tests covering VECTOR behavior, validation, persistence, and indexing

Review Change Stack

prajalg and others added 30 commits March 20, 2026 15:30
…etc.

validate using to only allow hnsw or ivf
allowlist distance metrics before emitting WITH DISTANCE
require numeric, finite, positive values for accuracy and vector index parameters
quote dropConstraintQuery constraint names
@prajalg prajalg changed the title feat(core): add VECTOR datatype support and Oracle vector integration feat(core): add VECTOR datatype support with Oracle vector integration May 22, 2026
@prajalg prajalg force-pushed the vector_support_v7 branch from 993528c to 484a3d9 Compare May 28, 2026 08:17
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d1f68e07-517f-4dda-8557-921410e518a8

📥 Commits

Reviewing files that changed from the base of the PR and between 9f1579e and f42eebe.

📒 Files selected for processing (1)
  • packages/core/test/integration/dialects/oracle/vector.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/test/integration/dialects/oracle/vector.test.js

📝 Walkthrough

Walkthrough

Adds a VECTOR data type, Sequelize vector-distance helpers, and full Oracle support: datatype override with sparse vectors and bind handling, VECTOR_DISTANCE formatting, VECTOR index creation (HNSW/IVF), show-index metadata changes, and extensive unit/integration tests.

Changes

Core VECTOR Type and Sequelize Integration

Layer / File(s) Summary
Core VECTOR type definitions and validation
packages/core/src/abstract-dialect/data-types.ts
Defines VectorOptions, NumericTypedArray, and VectorValue. Implements AbstractVECTORBase with validation and SQL emission and a concrete VECTOR type with format normalization. Adds isTypedArrayIterable.
Dialect VECTOR capability declarations
packages/core/src/abstract-dialect/dialect.ts, packages/core/src/abstract-dialect/query-interface.d.ts
Adds dataTypes.VECTOR flag to DialectSupports (default false) and expands IndexType to include 'VECTOR'; updates index-type docs.
Sequelize vector similarity helper methods
packages/core/src/sequelize.d.ts, packages/core/src/sequelize.js, packages/core/src/data-types.ts
Adds cosineDistance, innerProduct, l1Distance, l2Distance, vectorDistance helpers that validate dialect VECTOR support and return fn(...) expressions. Exports DataTypes.VECTOR wrapper.

Oracle VECTOR Implementation and Index Support

Layer / File(s) Summary
Oracle VECTOR datatype override with sparse vector support
packages/oracle/src/_internal/data-types-overrides.ts, packages/oracle/src/dialect.ts
Adds Oracle VECTOR class with constructor overloads/options object, format/storage validation, sparse-vector input shape and validation, dense-to-Float64Array binding, and Oracle bind type. Enables Oracle dialect VECTOR support.
Oracle VECTOR index creation and SQL generation
packages/oracle/src/query-generator.js
Extends addIndexQuery to generate CREATE VECTOR INDEX statements, normalizes/validates options, rejects ordered vector fields, supports HNSW/IVF with parameter fragments, and adds helper validators.
Oracle VECTOR_DISTANCE function formatting
packages/oracle/src/query-generator.internal.ts
Intercepts VECTOR distance functions to enforce arity and convert arrays/typed arrays or safe VECTOR literals into Oracle VECTOR('[...]') literals with strict parsing/validation.
Oracle index metadata and showIndexesQuery integration
packages/oracle/src/query-generator-typescript.internal.ts, packages/oracle/src/query.js, packages/core/test/unit/query-generator/show-indexes-query.test.ts
Conditionally includes INDEX_SUBTYPE for Oracle 23+, derives type/method (VECTOR → method via subtype), updates show-indexes SQL expectations and result mapping.

Comprehensive VECTOR Test Coverage

Layer / File(s) Summary
Core and Sequelize VECTOR unit tests
packages/core/test/unit/data-types/string-types.test.ts, packages/core/test/unit/sequelize.test.ts, test/esm-named-exports.test.js
Unit tests for DataTypes.VECTOR constructor, toSql, validate; tests for vector helper methods and ESM export ignore update.
Oracle query generation and VECTOR unit tests
packages/core/test/unit/dialects/oracle/query-generator.test.js, packages/core/test/unit/dialects/oracle/vector.test.js
Oracle INSERT/UPDATE query generation tests for vector-typed columns; comprehensive Oracle VECTOR unit tests covering indexing, datatype overrides, VECTOR_DISTANCE safety, and pagination ORDER BY behavior.
Oracle VECTOR integration tests
packages/core/test/integration/dialects/oracle/vector.test.js
Integration tests gated to Oracle ≥23.4.0 for data persistence, typed-array handling, similarity searches, VECTOR_LITERAL handling, and index sync behavior.

Sequence Diagram(s)

sequenceDiagram
  participant App
  participant Sequelize
  participant OracleQueryGenerator
  participant OracleDialect
  App->>Sequelize: vectorDistance(column, value)
  Sequelize->>OracleDialect: supports?.dataTypes?.VECTOR?
  alt supported
    OracleDialect-->>Sequelize: true
    Sequelize->>OracleQueryGenerator: formatFn(VECTOR_DISTANCE, column, value)
    OracleQueryGenerator->>OracleQueryGenerator: normalize value -> VECTOR('[...]')
    OracleQueryGenerator-->>Sequelize: fn('VECTOR_DISTANCE', column, literal)
    Sequelize-->>App: fn expression
  else unsupported
    OracleDialect-->>Sequelize: false/undefined
    Sequelize-->>App: Error (not implemented)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through types and Oracle halls,
Indexed HNSW and IVF walls,
Cosine, L2, and inner play,
Dense or sparse, in SQL they sway,
A rabbit cheers: vectors lead the way!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and specifically summarizes the main change: adding VECTOR datatype support with Oracle vector integration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vector_support_v7

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/test/integration/dialects/oracle/vector.test.js`:
- Around line 404-406: The finally block unconditionally calls
IndexedItem.drop(), which can throw if the table was never created and mask the
real assertion failure; update the cleanup to guard the drop call by checking
existence or swallowing errors: wrap IndexedItem.drop() in a try/catch that
logs/ignores "table not found" errors (or use an API like
IndexedItem.exists()/hasTable() if available) so only genuine failures surface
while ensuring IndexedItem.drop() is still attempted when appropriate.

In `@packages/oracle/src/query-generator.internal.ts`:
- Around line 88-90: Update the VECTOR typed-array handling in `#formatVectorArg`
so it accepts the full set of numeric typed arrays used by core/base (match
packages/oracle/src/_internal/data-types-overrides.ts) instead of only
Float32Array, Float64Array, and Uint8Array: either import and call the existing
isNumericTypedArray (or reference BaseTypes.NumericTypedArray) to decide when to
call `#formatVectorFromIterable`, or expand the instanceof checks to include
Int8Array, Uint8ClampedArray, Int16Array, Uint16Array, Int32Array, and
Uint32Array; ensure the conditional that calls
this.#formatVectorFromIterable(arg) is updated and keep or update the
error/validation message to clearly reflect the supported typed-array set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d2bd3e7f-abb0-4ad9-830c-f8ca459c8c50

📥 Commits

Reviewing files that changed from the base of the PR and between 8260c29 and 9f1579e.

📒 Files selected for processing (19)
  • packages/core/src/abstract-dialect/data-types.ts
  • packages/core/src/abstract-dialect/dialect.ts
  • packages/core/src/abstract-dialect/query-interface.d.ts
  • packages/core/src/data-types.ts
  • packages/core/src/sequelize.d.ts
  • packages/core/src/sequelize.js
  • packages/core/test/integration/dialects/oracle/vector.test.js
  • packages/core/test/unit/data-types/string-types.test.ts
  • packages/core/test/unit/dialects/oracle/query-generator.test.js
  • packages/core/test/unit/dialects/oracle/vector.test.js
  • packages/core/test/unit/query-generator/show-indexes-query.test.ts
  • packages/core/test/unit/sequelize.test.ts
  • packages/oracle/src/_internal/data-types-overrides.ts
  • packages/oracle/src/dialect.ts
  • packages/oracle/src/query-generator-typescript.internal.ts
  • packages/oracle/src/query-generator.internal.ts
  • packages/oracle/src/query-generator.js
  • packages/oracle/src/query.js
  • test/esm-named-exports.test.js

Comment thread packages/core/test/integration/dialects/oracle/vector.test.js
Comment on lines +88 to +90
if (arg instanceof Float32Array || arg instanceof Float64Array || arg instanceof Uint8Array) {
return this.#formatVectorFromIterable(arg);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 💤 Low value

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if Oracle oracledb driver documentation specifies which typed arrays are supported for VECTOR
rg -n "Float32Array|Float64Array|Uint8Array|Int8Array|Int16Array|Int32Array" packages/oracle/src --type ts -C2

Repository: prajalg/sequelize

Length of output: 2109


Expand Oracle VECTOR typed-array handling to match core/base numeric typed arrays

packages/oracle/src/query-generator.internal.ts formats only Float32Array, Float64Array, and Uint8Array at lines 88-90, but packages/oracle/src/_internal/data-types-overrides.ts treats BaseTypes.NumericTypedArray as including Int8Array, Uint8ClampedArray, Int16Array, Uint16Array, Int32Array, and Uint32Array as well. This means callers can pass Oracle VECTOR validation and later hit a confusing failure in #formatVectorArg.

Update #formatVectorArg to use the same isNumericTypedArray logic (or expand the instanceof checks to cover the full NumericTypedArray set), or adjust the Oracle-side validation/error message to clearly state which typed arrays are actually supported for VECTOR distance functions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/oracle/src/query-generator.internal.ts` around lines 88 - 90, Update
the VECTOR typed-array handling in `#formatVectorArg` so it accepts the full set
of numeric typed arrays used by core/base (match
packages/oracle/src/_internal/data-types-overrides.ts) instead of only
Float32Array, Float64Array, and Uint8Array: either import and call the existing
isNumericTypedArray (or reference BaseTypes.NumericTypedArray) to decide when to
call `#formatVectorFromIterable`, or expand the instanceof checks to include
Int8Array, Uint8ClampedArray, Int16Array, Uint16Array, Int32Array, and
Uint32Array; ensure the conditional that calls
this.#formatVectorFromIterable(arg) is updated and keep or update the
error/validation message to clearly reflect the supported typed-array set.

@prajalg

prajalg commented May 29, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants