Skip to content

vec_distance_cosine returns NaN for zero-magnitude vectors #8

@josharian

Description

@josharian

zero magnitude vectors are obviously bad. but yielding nans for them causes expanding nan-poisoning (e.g. corrupted KNN ordering because nans order weirdly).

not sure whether this is an intentional design decision, so thought i would ask.

some simple reproducers:

CREATE VIRTUAL TABLE t USING vec0(v float[3] distance_metric=cosine);

INSERT INTO t(rowid, v) VALUES
(1, '[1.0, 0.0, 0.0]'),
(2, '[0.0, 0.0, 0.0]'),
(3, '[0.9, 0.1, 0.0]');

SELECT vec_distance_cosine('[0,0,0]', '[1,2,3]');

SELECT rowid, distance FROM t
WHERE v MATCH '[1.0, 0.0, 0.0]'
ORDER BY distance
LIMIT 2;

i'd suggest returning NULL instead of dividing by zero; ditto for vec_normalize. and/or rejecting zero vectors at insert time when using cosine distance_metric.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions