Skip to content

fix(lancedb): tag search span as RETRIEVER + populate input/output#176

Open
SuhaniNagpal7 wants to merge 1 commit into
devfrom
fix/lancedb-retriever-attrs
Open

fix(lancedb): tag search span as RETRIEVER + populate input/output#176
SuhaniNagpal7 wants to merge 1 commit into
devfrom
fix/lancedb-retriever-attrs

Conversation

@SuhaniNagpal7
Copy link
Copy Markdown
Contributor

Summary

LanceDBInstrumentor emits a span for table.search() (.to_list() / .to_pyarrow() etc.) but never sets the FI canonical retriever keys. Future AGI dashboard renders the span as Type: unknown with empty Input/Output panels.

LanceDB uses a query-builder pattern where the actual query lives on the builder instance as _query / _limit (instead of being in args/kwargs), so the wrapper reads from those attributes.

What changes

All in traceai_lancedb/_wrappers.py:

  • Optional fi_instrumentation.fi_types import with raw-string fallback.
  • SearchWrapper.__call__:
    • gen_ai.span.kind = \"RETRIEVER\"
    • Input summary {limit, output_format} extended with either query (capped at 500 chars, for text queries) or vector_dim (for vector queries) based on what's on the builder instance.
    • input.value = JSON of the summary; input.mime_type = application/json.
    • After the wrapped call, output.value based on output_format:
      • to_list → JSON of the first 50 rows
      • PyArrow Table (has num_rows) → JSON via to_pylist() or to_pydict() of the first 50 rows
    • output.mime_type = application/json in both cases.

Add/Update/Delete/CreateTable/DropTable/OpenTable are untouched.

Verified

  • In-process attribute check using a mock query-builder instance with _query=[0.1,0.2,0.3] and _limit=2.
  • Real end-to-end ingest to Future AGI → confirmed via Future AGI MCP that the lancedb search span shows:
    • Type: Retriever
    • Input: {\"limit\": 2, \"output_format\": \"to_list\", \"vector_dim\": 3}
    • Output: [{\"id\": \"doc1\", \"text\": \"Eiffel Tower\"}, {\"id\": \"doc2\", \"text\": \"Statue of Liberty\"}]

Out of scope

Per-document retrieval.documents.N.* attrs (Tier 3). Other 6 vector DBs get their own PRs.

The LanceDB instrumentor emits a span for table.search().to_list() /
to_pyarrow() but never sets the FI canonical retriever keys. Future
AGI dashboard shows Type=unknown with empty Input/Output panels.

LanceDB uses a query-builder pattern where the actual query lives on
the builder instance as `_query` / `_limit` rather than in args/kwargs.
The wrapper now reads from those attributes.

Changes (all in traceai_lancedb/_wrappers.py)

- Optional `fi_instrumentation.fi_types` import with raw-string fallback.
- In `SearchWrapper.__call__`:
  - Set `gen_ai.span.kind = "RETRIEVER"`.
  - Build an input summary {limit, output_format} extended with either
    `query` (capped at 500 chars, for text queries) or `vector_dim`
    (for vector queries) based on what's on the builder instance.
  - Set `input.value` as the JSON summary with `input.mime_type =
    application/json`.
  - After the wrapped call, set `output.value` based on output_format:
    - `to_list` → JSON of the first 50 rows
    - PyArrow table (`num_rows` attribute) → JSON via `to_pylist()` or
      `to_pydict()` of the first 50 rows
  - Set `output.mime_type = application/json` in both cases.

Add/Update/Delete/CreateTable/DropTable/OpenTable are untouched.

Verified end-to-end via Future AGI MCP. `lancedb search` span now
shows Type=Retriever in the dashboard with populated Input/Output.
@SuhaniNagpal7 SuhaniNagpal7 self-assigned this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant