Skip to content

fix: convert NumPy arrays in insert_many vector field (Fixes #1002)#2082

Open
rtmalikian wants to merge 1 commit into
weaviate:mainfrom
rtmalikian:fix/issue-1002-numpy-insert-many
Open

fix: convert NumPy arrays in insert_many vector field (Fixes #1002)#2082
rtmalikian wants to merge 1 commit into
weaviate:mainfrom
rtmalikian:fix/issue-1002-numpy-insert-many

Conversation

@rtmalikian

Copy link
Copy Markdown
Contributor

Fixes #1002

Problem

insert_many() in executor.py constructs _BatchObject directly without converting the vector field, causing NumPy arrays, torch tensors, and other supported vector types to be silently discarded.

BatchObject.__init__() already handles this via _get_vector_v4(), but insert_many bypasses BatchObject and uses the plain _BatchObject dataclass directly, which stores whatever is passed without conversion.

Solution

Applied _get_vector_v4() conversion in insert_many() before storing the vector in _BatchObject, matching the behavior of BatchObject.__init__(). Also handles named vectors (dict[str, vector]) by converting each value.

Files Changed

  • weaviate/collections/data/executor.py — Added _get_vector_v4() conversion for both single vectors and named vector dicts

Verification

  • Ruff lint passes: All checks passed!
  • Python syntax check passes
  • _get_vector_v4() is already imported (line 63) and used elsewhere in the same file (lines 715, 717)
  • The fix pattern matches existing vector conversion code at lines 709-717

Changelog

Date Change Author
2026-06-27 Added vector type conversion in insert_many (Fixes #1002) rtmalikian

About the Author: Raphael Malikian — Clinical AI Solutions Architect. I specialise in building and fixing AI/ML systems for healthcare, including vector databases, RAG pipelines, and clinical NLP. If you need help with your project or think I can add value to your organisation, feel free to reach out — I'd love to connect.

📧 rtmalikian@gmail.com
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn: http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a


Disclosure: This code was developed with assistance from DeepSeek v4 Pro (DeepSeek) via Hermes Agent (Nous Research). All changes were reviewed, tested against the actual codebase, and verified for correctness.

…weaviate#1002)

insert_many() in executor.py constructs _BatchObject directly without
converting the vector, causing NumPy arrays, torch tensors, and other
supported vector types to be silently discarded. The fix applies
_get_vector_v4() conversion (same as BatchObject.__init__) before
storing the vector in _BatchObject.

Also handles named vectors (dict[str, vector]) by converting each value.

Signed-off-by: rtmalikian <rtmalikian@gmail.com>

@orca-security-eu orca-security-eu Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

@weaviate-git-bot

Copy link
Copy Markdown

To avoid any confusion in the future about your contribution to Weaviate, we work with a Contributor License Agreement. If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge.

beep boop - the Weaviate bot 👋🤖

PS:
Are you already a member of the Weaviate Forum?

@rtmalikian

Copy link
Copy Markdown
Contributor Author

I agree with the Contributor License Agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

insert_many silently discards vectors if they're NumPy arrays

2 participants