zero magnitude vectors are obviously bad. but yielding nans for them causes expanding nan-poisoning (e.g. corrupted KNN ordering because nans order weirdly).
not sure whether this is an intentional design decision, so thought i would ask.
some simple reproducers:
CREATE VIRTUAL TABLE t USING vec0(v float[3] distance_metric=cosine);
INSERT INTO t(rowid, v) VALUES
(1, '[1.0, 0.0, 0.0]'),
(2, '[0.0, 0.0, 0.0]'),
(3, '[0.9, 0.1, 0.0]');
SELECT vec_distance_cosine('[0,0,0]', '[1,2,3]');
SELECT rowid, distance FROM t
WHERE v MATCH '[1.0, 0.0, 0.0]'
ORDER BY distance
LIMIT 2;
i'd suggest returning NULL instead of dividing by zero; ditto for vec_normalize. and/or rejecting zero vectors at insert time when using cosine distance_metric.
zero magnitude vectors are obviously bad. but yielding nans for them causes expanding nan-poisoning (e.g. corrupted KNN ordering because nans order weirdly).
not sure whether this is an intentional design decision, so thought i would ask.
some simple reproducers:
i'd suggest returning NULL instead of dividing by zero; ditto for vec_normalize. and/or rejecting zero vectors at insert time when using cosine distance_metric.