fix(pgvector): tag similarity-search SQL as RETRIEVER + populate input#177
Open
SuhaniNagpal7 wants to merge 1 commit into
Open
fix(pgvector): tag similarity-search SQL as RETRIEVER + populate input#177SuhaniNagpal7 wants to merge 1 commit into
SuhaniNagpal7 wants to merge 1 commit into
Conversation
The pgvector instrumentor wraps psycopg's cursor.execute() and
executemany() and tags the span based on the SQL it sees: "query"
for SELECT with a vector op, "insert"/"update"/"delete" for the
respective DML, "unknown" otherwise. Currently none of these set the
FI canonical retriever keys, so even a textbook RAG similarity-search
query lands in Future AGI as Type=unknown with an empty Input panel.
This PR adds the FI canonical kind + input.value, but ONLY when the
detected operation is a similarity-search read (db.operation.name ==
"query"). INSERT/UPDATE/DELETE that happen to contain a vector
operator (e.g. INSERT ... RETURNING id <=> '[...]') are deliberately
NOT tagged RETRIEVER — they're writes, not retrievals.
Changes (all in traceai_pgvector/_wrappers.py)
- Optional `fi_instrumentation.fi_types` import with raw-string fallback.
- New `_is_pgvector_query_op(metadata)` predicate — True iff the
detected operation is "query".
- New `_truncate_sql(sql, limit=2000)` helper to cap the SQL string
attached to input.value at a reasonable size for span attributes.
- `ExecuteWrapper.__call__` — when `_is_pgvector_query_op(metadata)`:
- Set `gen_ai.span.kind = "RETRIEVER"`.
- Set `input.value` to the SQL string (resolved via `as_string(None)`
when a psycopg `Composed` object is passed, else `str()`), truncated.
- Set `input.mime_type = text/plain`.
- `ExecuteManyWrapper.__call__` — same treatment for batched
similarity-search reads.
Output.value is deliberately not set: psycopg's execute() returns
None (rows live on the cursor, not the return value), so there's no
clean way to capture results from this wrapping layer.
All pre-existing `db.vector.*` attrs preserved. INSERT/UPDATE/DELETE
behavior unchanged — they still get the existing pgvector attrs but
without the RETRIEVER tag, which is correct.
Verified end-to-end via Future AGI MCP:
* `pgvector query` (SELECT ... <=> ... ORDER BY ... LIMIT) → Type=Retriever
* `pgvector insert` (INSERT with a vector op) → Type=unknown ✅
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PgvectorInstrumentorwrapspsycopg.cursor.execute()andexecutemany()and tags the span based on the SQL it sees:queryfor SELECT with a vector op,insert/update/deleteotherwise. None of these currently set the FI canonical retriever keys, so a textbook RAG similarity-search query lands in Future AGI as Type: unknown with an empty Input panel.This PR adds
gen_ai.span.kind = RETRIEVERandinput.valueonly when the detected operation is a similarity-search read (db.operation.name == \"query\"). INSERTs that happen to contain a vector operator (e.g.INSERT ... RETURNING id <=> '[...]') are deliberately not tagged — they're writes, not retrievals.What changes
All in
traceai_pgvector/_wrappers.py:fi_instrumentation.fi_typesimport with raw-string fallback._is_pgvector_query_op(metadata)predicate — True iff the detected operation is\"query\"._truncate_sql(sql, limit=2000)helper.ExecuteWrapper.__call__andExecuteManyWrapper.__call__— when_is_pgvector_query_op(metadata):gen_ai.span.kind = \"RETRIEVER\"input.value= the SQL string (resolved viaas_string(None)for psycopgComposedobjects, elsestr()), truncated to 2000 charsinput.mime_type = text/plainOutput is deliberately not set.
psycopg.execute()returnsNone(the rows live on the cursor, not the return value), so there's no clean way to capture results from this wrapping layer. The Output panel will remain empty for pgvector retriever spans — that's expected.All pre-existing
db.vector.*attrs preserved. INSERT/UPDATE/DELETE behavior is unchanged — they still get the existing pgvector technical attrs but without the RETRIEVER tag.Verified
gen_ai.span.kind = \"RETRIEVER\",input.valuepopulatedgen_ai.span.kindNOT set (correctly staysNone/unknown)pgvector query(SELECT ... embedding <=> '[...]' ORDER BY dist LIMIT 5) → Type: Retriever, Input: full SQL stringpgvector insert(INSERT with a vector op) → Type: unknown ✅ (negative case correctly handled)Out of scope
cursor.fetchall()/ iteration — separate concern).retrieval.documents.N.*attrs (Tier 3).