From 66f0e27873cb2d161156844f331f990ea93296f2 Mon Sep 17 00:00:00 2001 From: Vinicius Mello Date: Fri, 3 Jul 2026 13:08:34 -0300 Subject: [PATCH 1/5] docs: add design spec for OPEN-11569 additional columns support Spec for extending GoogleConversationalSearchTracer to accept customer-supplied additional columns (static per-client and per-call), so data like a custom trace ID can be attached to imported rows. --- ...google-tracer-additional-columns-design.md | 79 +++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md diff --git a/docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md b/docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md new file mode 100644 index 0000000..829b716 --- /dev/null +++ b/docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md @@ -0,0 +1,79 @@ +# Support additional columns in GoogleConversationalSearchTracer + +- Linear: [OPEN-11569](https://linear.app/openlayer/issue/OPEN-11569/support-additional-columns-in-google-conversational-search-tracer) +- Customer: gen (per Vikas Nair, Slack thread linked on the issue) + +## Problem + +`Openlayer::Integrations::GoogleConversationalSearchTracer` (`lib/openlayer/integrations/google_conversational_search_tracer.rb`) auto-extracts a fixed set of fields from each `answer_query` call and response (query, answer, latency, citations, references, session, user, etc.) but gives the caller no way to attach their own custom data — e.g. a trace ID generated by their own application — to the row sent to Openlayer. + +The Openlayer stream API already accepts arbitrary extra keys per row (`DataStreamParams#rows` is typed `ArrayObject}>`, `lib/openlayer/models/inference_pipelines/data_stream_params.rb:23-28`) with no schema restriction, and no `config` registration is required for non-special columns. The gap is entirely in the tracer's Ruby API, not the wire format. + +## Precedent + +- `session_id`/`user_id` were added to this same tracer in commit `9157c52` (OPEN-8574) as static kwargs on `trace_client`, applied to every trace sent through that client instance. +- The Python SDK's tracers (`trace_openai`, `trace_anthropic`, etc.) let callers attach arbitrary data via `update_current_trace(metadata={...})`; in `post_process_trace()` every key in that dict is flattened directly onto the row as a top-level column (`src/openlayer/lib/tracing/tracer.py` in openlayer-python), not nested under a `metadata` key. Only truly special columns get a `config[...ColumnName]` entry. +- This tracer's existing `metadata:` row key already has an established, different meaning: it's the tracer's own auto-extracted Google-specific data (citations, references, provider, service, etc.). The new feature must not collide with or overload that name. + +## Design + +### API + +`trace_client` gains a new optional `additional_columns:` kwarg — a `Hash` of column name => value, applied as a static default to every trace sent through that client: + +```ruby +GoogleConversationalSearchTracer.trace_client( + google_client, + openlayer_client: openlayer, + inference_pipeline_id: pipeline_id, + additional_columns: { environment: "production", app_version: "2.4.1" } +) +``` + +The wrapped `answer_query` method also recognizes `additional_columns:` as a reserved keyword argument, for per-call values: + +```ruby +google_client.answer_query( + serving_config: serving_config, + query: query, + additional_columns: { trace_id: my_app.generate_trace_id } +) +``` + +The wrapper (`client.define_singleton_method(:answer_query) do |*args, **kwargs, &block| ... end`) pops `:additional_columns` out of `kwargs` before forwarding the rest to the real Google client via `original_answer_query.call(*args, **kwargs, &block)`. Google's client never sees this key, so it cannot collide with any real Discovery Engine request field (confirmed against `rbi/google_discovery_engine.rbi`, which already types `answer_query`'s extra kwargs as `T.untyped`). + +### Merge & precedence + +1. Keys in both the static and per-call `additional_columns` hashes are normalized to Symbols (`key.to_sym`) before anything else. This is required for the reserved-key filter below to work regardless of whether the caller used String or Symbol keys — without normalization, a String `"answer"` wouldn't match a Symbol `:answer` in the reserved list and could slip through and clobber the real answer column. +2. Per-call `additional_columns` are shallow-merged over the client-level static `additional_columns` (call-level wins on key conflict). +3. From that combined hash, any key matching a reserved/built-in row key is dropped: `:query`, `:answer`, `:latency_ms`, `:timestamp`, `:metadata`, `:steps`, `:context`, `:session_id`, `:user_id`. A dropped key logs via `warn_if_debug` (only visible when `OPENLAYER_DEBUG` is set), so this is silent by default but debuggable. +4. The remaining keys are merged onto `trace_data[:rows][0]` as top-level columns (sibling of `query`/`answer`/etc.) — no `config` registration needed, matching the "arbitrary row keys pass through" contract confirmed in `data_stream_params.rb`. + +Built-ins are computed first; the filtered additional columns are merged in afterward but with reserved keys already stripped, so they can never overwrite core trace data — this is what "built-ins win" means in practice, without needing a specific merge order to enforce it. + +### Robustness + +If `additional_columns` (static or per-call) is not a `Hash` (nil, String, etc.), treat it as `{}` and log via `warn_if_debug` — do not raise. This matches the file's existing philosophy that a tracing mistake must never break the customer's actual application (every `send_trace` call site is already wrapped in `rescue StandardError`). + +### Docs + +- Update the class-level YARD `@example` on `GoogleConversationalSearchTracer`. +- Update `examples/google_tracer.rb` to demonstrate both static and per-call `additional_columns`. +- Update `@param` docs on `trace_client` and `send_trace`. +- Update `rbi/openlayer/integrations.rbi` (the OPEN-8574 commit updated this same file when it added `session_id`/`user_id`). + +### Testing + +No test coverage exists today for this tracer. Add `test/openlayer/integrations/google_conversational_search_tracer_test.rb` as a plain `Minitest::Test` (no network — a fake `openlayer_client` double capturing the `.inference_pipelines.data.stream(...)` call args), covering: + +- Static `additional_columns` (set via `trace_client`) appear as top-level row columns. +- Per-call `additional_columns` (passed to `answer_query`) appear as top-level row columns. +- Per-call values override static values on key conflict. +- Keys colliding with reserved row columns are dropped, not merged — including when passed as a String key (e.g. `"answer"`) rather than a Symbol. +- A non-`Hash` `additional_columns` value doesn't raise. + +## Out of scope + +- No change to `lib/openlayer/models`/`lib/openlayer/resources` (Stainless-generated; the wire format already supports arbitrary row keys, confirmed via `.stats.yml` codegen fingerprint — these directories are regenerated from an OpenAPI spec and must not be hand-edited). +- No thread-local/context-manager mechanism (rejected approach B — unnecessary complexity for this need). +- No new public method alongside `answer_query` (rejected approach C — breaks the tracer's transparent-instrumentation model). From 95279d1484ee8dc4f25160851035191c40c8e00f Mon Sep 17 00:00:00 2001 From: Vinicius Mello Date: Fri, 3 Jul 2026 13:46:27 -0300 Subject: [PATCH 2/5] feat(closes OPEN-11569): support additional columns in ConversationalSearchService tracer --- .../google_conversational_search_tracer.rb | 81 +++++++++- ...oogle_conversational_search_tracer_test.rb | 144 ++++++++++++++++++ 2 files changed, 219 insertions(+), 6 deletions(-) create mode 100644 test/openlayer/integrations/google_conversational_search_tracer_test.rb diff --git a/lib/openlayer/integrations/google_conversational_search_tracer.rb b/lib/openlayer/integrations/google_conversational_search_tracer.rb index aef47a6..23ed398 100644 --- a/lib/openlayer/integrations/google_conversational_search_tracer.rb +++ b/lib/openlayer/integrations/google_conversational_search_tracer.rb @@ -21,15 +21,25 @@ module Integrations # Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( # google_client, # openlayer_client: openlayer, - # inference_pipeline_id: 'your-pipeline-id' + # inference_pipeline_id: 'your-pipeline-id', + # additional_columns: { environment: 'production' } # ) # - # # Now all answer_query calls are automatically traced + # # Now all answer_query calls are automatically traced! Pass + # # additional_columns on an individual call to attach data (like your + # # own trace ID) to just that row; it takes precedence over the + # # static defaults above on a key conflict. # response = google_client.answer_query( # serving_config: "projects/.../servingConfigs/default", - # query: { text: "What is the meaning of life?" } + # query: { text: "What is the meaning of life?" }, + # additional_columns: { trace_id: "abc-123" } # ) class GoogleConversationalSearchTracer + # Row keys computed by this tracer. Any key in a caller-supplied + # additional_columns hash matching one of these is dropped, so custom + # data can never overwrite core trace fields. + RESERVED_ROW_KEYS = [:query, :answer, :latency_ms, :timestamp, :metadata, :steps, :context, :session_id, :user_id].freeze + # Enable tracing on a Google ConversationalSearchService client # # @param client [Google::Cloud::DiscoveryEngine::V1::ConversationalSearchService::Client] @@ -42,8 +52,12 @@ class GoogleConversationalSearchTracer # Optional session ID to use for all traces. Takes precedence over auto-extracted sessions. # @param user_id [String, nil] # Optional user ID to use for all traces. + # @param additional_columns [Hash, nil] + # Optional static column values merged into every trace sent through this client (e.g. `{ environment: 'production' }`). + # A value passed to an individual answer_query call takes precedence over these on a key conflict. Keys colliding + # with a reserved row column (query, answer, latency_ms, timestamp, metadata, steps, context, session_id, user_id) are dropped. # @return [void] - def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil) + def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {}) # Store original method reference original_answer_query = client.method(:answer_query) @@ -52,6 +66,10 @@ def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session # Capture start time start_time = Time.now + # Extract per-call additional columns before forwarding to the + # real client; Google's client never sees this key + call_additional_columns = kwargs.delete(:additional_columns) + # Execute the original method response = original_answer_query.call(*args, **kwargs, &block) @@ -69,7 +87,9 @@ def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session openlayer_client: openlayer_client, inference_pipeline_id: inference_pipeline_id, session_id: session_id, - user_id: user_id + user_id: user_id, + additional_columns: additional_columns, + call_additional_columns: call_additional_columns ) rescue StandardError => e # Never break the user's application due to tracing errors @@ -95,8 +115,10 @@ def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session # @param inference_pipeline_id [String] Pipeline ID # @param session_id [String, nil] Optional session ID (takes precedence over auto-extracted) # @param user_id [String, nil] Optional user ID + # @param additional_columns [Hash, nil] Optional static column values (see {.trace_client}) + # @param call_additional_columns [Hash, nil] Optional per-call column values; takes precedence over additional_columns # @return [void] - def self.send_trace(args:, kwargs:, response:, start_time:, end_time:, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil) + def self.send_trace(args:, kwargs:, response:, start_time:, end_time:, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {}, call_additional_columns: {}) # Calculate latency latency_ms = ((end_time - start_time) * 1000).round(2) @@ -199,6 +221,12 @@ def self.send_trace(args:, kwargs:, response:, start_time:, end_time:, openlayer trace_data[:config][:userIdColumnName] = "user_id" end + # Merge additional columns (per-call values take precedence over + # static defaults; keys colliding with reserved row columns are + # dropped so custom data can never corrupt core trace fields) + extra_columns = resolve_additional_columns(additional_columns, call_additional_columns) + trace_data[:rows][0].merge!(extra_columns) unless extra_columns.empty? + # Send to Openlayer openlayer_client .inference_pipelines @@ -594,6 +622,45 @@ def self.extract_query_understanding_info(answer) nil end + # Merge static and per-call additional columns into a single Hash of + # extra row columns. Call-level values take precedence over static + # ones on key conflict, and any key colliding with a reserved row + # column is dropped. + # + # @param static_columns [Object] Value passed to trace_client (expected Hash) + # @param call_columns [Object] Value passed to an individual answer_query call (expected Hash) + # @return [Hash] Extra columns safe to merge onto a trace row + def self.resolve_additional_columns(static_columns, call_columns) + merged = normalize_additional_columns(static_columns).merge(normalize_additional_columns(call_columns)) + + merged.each_with_object({}) do |(key, value), result| + if RESERVED_ROW_KEYS.include?(key) + warn_if_debug("[Openlayer] additional_columns key :#{key} collides with a reserved column and was ignored") + else + result[key] = value + end + end + end + + # Normalize an additional_columns value into a Hash with Symbol keys. + # Non-Hash input (or a key that can't be a Symbol) is dropped rather + # than raising, so a caller mistake can never break tracing. + # + # @param columns [Object] Expected to be a Hash of column name => value + # @return [Hash] + def self.normalize_additional_columns(columns) + return {} unless columns.is_a?(Hash) + + columns.each_with_object({}) do |(key, value), result| + next unless key.respond_to?(:to_sym) + + result[key.to_sym] = value + end + rescue StandardError => e + warn_if_debug("[Openlayer] Failed to normalize additional columns: #{e.message}") + {} + end + # Safely extract a field from an object # # @param obj [Object] Object to extract from @@ -659,6 +726,8 @@ def self.warn_if_debug(message) :extract_session, :extract_user_pseudo_id, :extract_query_understanding_info, + :resolve_additional_columns, + :normalize_additional_columns, :safe_extract, :safe_count, :extract_timestamp diff --git a/test/openlayer/integrations/google_conversational_search_tracer_test.rb b/test/openlayer/integrations/google_conversational_search_tracer_test.rb new file mode 100644 index 0000000..05620c0 --- /dev/null +++ b/test/openlayer/integrations/google_conversational_search_tracer_test.rb @@ -0,0 +1,144 @@ +# frozen_string_literal: true + +require_relative "../test_helper" +require_relative "../../../lib/openlayer/integrations/google_conversational_search_tracer" + +module Openlayer + module Test + module Integrations + end + end +end + +class Openlayer::Test::Integrations::GoogleConversationalSearchTracerTest < Minitest::Test + Tracer = Openlayer::Integrations::GoogleConversationalSearchTracer + + class FakeAnswer + attr_reader :answer_text + + def initialize(answer_text) + @answer_text = answer_text + end + end + + class FakeResponse + attr_reader :answer + + def initialize(answer_text) + @answer = FakeAnswer.new(answer_text) + end + end + + class FakeGoogleClient + def answer_query(serving_config:, query:) # rubocop:disable Lint/UnusedMethodArgument + FakeResponse.new("hi") + end + end + + class FakeDataResource + attr_reader :calls + + def initialize + @calls = [] + end + + def stream(inference_pipeline_id, **trace_data) + @calls << {inference_pipeline_id: inference_pipeline_id}.merge(trace_data) + end + end + + class FakeInferencePipelines + attr_reader :data + + def initialize(data) + @data = data + end + end + + class FakeOpenlayerClient + attr_reader :inference_pipelines + + def initialize + @data = FakeDataResource.new + @inference_pipelines = FakeInferencePipelines.new(@data) + end + + def last_row + @data.calls.last[:rows][0] + end + end + + def setup + @openlayer_client = FakeOpenlayerClient.new + @start_time = Time.now + @end_time = @start_time + 1 + end + + def trace_row(**overrides) + defaults = { + args: [], + kwargs: {query: "hello"}, + response: FakeResponse.new("hi"), + start_time: @start_time, + end_time: @end_time, + openlayer_client: @openlayer_client, + inference_pipeline_id: "pipeline-id" + } + + Tracer.send_trace(**defaults, **overrides) + @openlayer_client.last_row + end + + def test_static_additional_columns_appear_on_the_row + row = trace_row(additional_columns: {environment: "production"}) + + assert_equal("production", row[:environment]) + end + + def test_per_call_additional_columns_appear_on_the_row + row = trace_row(call_additional_columns: {trace_id: "abc-123"}) + + assert_equal("abc-123", row[:trace_id]) + end + + def test_per_call_additional_columns_override_static_on_conflict + row = trace_row( + additional_columns: {trace_id: "static-value"}, + call_additional_columns: {trace_id: "call-value"} + ) + + assert_equal("call-value", row[:trace_id]) + end + + def test_reserved_keys_are_dropped_even_as_string_keys + row = trace_row(additional_columns: {"answer" => "hijacked", trace_id: "abc-123"}) + + assert_equal("hi", row[:answer]) + assert_equal("abc-123", row[:trace_id]) + end + + def test_non_hash_additional_columns_does_not_raise + row = trace_row(additional_columns: "not-a-hash", call_additional_columns: nil) + + assert_equal("hi", row[:answer]) + end + + def test_trace_client_strips_additional_columns_before_forwarding_to_google_client + google_client = FakeGoogleClient.new + + Tracer.trace_client( + google_client, + openlayer_client: @openlayer_client, + inference_pipeline_id: "pipeline-id" + ) + + response = google_client.answer_query( + serving_config: "config", + query: "hello", + additional_columns: {trace_id: "abc-123"} + ) + + assert_equal("hi", response.answer.answer_text) + assert_equal("abc-123", @openlayer_client.last_row[:trace_id]) + end +end From 3674bb8539e5e9a3a630f6dd9477bb573a7dc475 Mon Sep 17 00:00:00 2001 From: Vinicius Mello Date: Fri, 3 Jul 2026 13:54:13 -0300 Subject: [PATCH 3/5] docs(closes OPEN-11569): document additional columns support in ConversationalSearchService tracer example and RBI sig --- examples/google_tracer.rb | 13 ++++++++++--- rbi/openlayer/integrations.rbi | 6 ++++-- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/examples/google_tracer.rb b/examples/google_tracer.rb index 0f378eb..3c6ac3d 100755 --- a/examples/google_tracer.rb +++ b/examples/google_tracer.rb @@ -7,6 +7,7 @@ # Add lib directory to load path $LOAD_PATH.unshift(File.expand_path("../lib", __dir__)) +require "securerandom" require "openlayer" require "openlayer/integrations/google_conversational_search_tracer" require "google/cloud/discovery_engine/v1" @@ -19,19 +20,25 @@ api_key: ENV["OPENLAYER_API_KEY"] ) -# Enable tracing - this patches the client to send all queries to Openlayer +# Enable tracing - this patches the client to send all queries to Openlayer. +# additional_columns here is a static default applied to every trace sent +# through this client. Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( google_client, openlayer_client: openlayer, - inference_pipeline_id: ENV["OPENLAYER_INFERENCE_PIPELINE_ID"] + inference_pipeline_id: ENV["OPENLAYER_INFERENCE_PIPELINE_ID"], + additional_columns: {environment: "production"} ) # Use the client normally - all answer_query calls are now automatically traced! +# additional_columns here is per-call; it takes precedence over the static +# default above on a key conflict. response = google_client.answer_query( serving_config: ENV["GOOGLE_SERVING_CONFIG"], query: Google::Cloud::DiscoveryEngine::V1::Query.new( text: "What is the meaning of life?" - ) + ), + additional_columns: {trace_id: SecureRandom.uuid} ) puts "Answer: #{response.answer.answer_text}" diff --git a/rbi/openlayer/integrations.rbi b/rbi/openlayer/integrations.rbi index 421bce5..2cf90e8 100644 --- a/rbi/openlayer/integrations.rbi +++ b/rbi/openlayer/integrations.rbi @@ -10,7 +10,8 @@ module Openlayer openlayer_client: Openlayer::Client, inference_pipeline_id: String, session_id: T.nilable(String), - user_id: T.nilable(String) + user_id: T.nilable(String), + additional_columns: T::Hash[Symbol, T.untyped] ).void end def self.trace_client( @@ -18,7 +19,8 @@ module Openlayer openlayer_client:, inference_pipeline_id:, session_id: nil, - user_id: nil + user_id: nil, + additional_columns: {} ) end end From 4202708d27c5a88ba34542965e45b20be5e151e1 Mon Sep 17 00:00:00 2001 From: Vinicius Mello Date: Fri, 3 Jul 2026 16:33:50 -0300 Subject: [PATCH 4/5] docs: add implementation plan for OPEN-11569 additional columns support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Companion to the design spec — the task-by-task plan followed by the subagent-driven implementation of the two feature commits. --- ...-07-03-google-tracer-additional-columns.md | 665 ++++++++++++++++++ 1 file changed, 665 insertions(+) create mode 100644 docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md diff --git a/docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md b/docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md new file mode 100644 index 0000000..bfd27f1 --- /dev/null +++ b/docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md @@ -0,0 +1,665 @@ +# Additional Columns in GoogleConversationalSearchTracer Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Let callers of `Openlayer::Integrations::GoogleConversationalSearchTracer` attach their own custom columns (e.g. a trace ID) to traced rows, both as a static default per client and as a per-call override. + +**Architecture:** Add an `additional_columns:` kwarg to `trace_client` (static defaults) and intercept a same-named reserved kwarg on the wrapped `answer_query` call (per-call values) before it reaches the real Google client. `send_trace` normalizes both hashes to Symbol keys, merges per-call over static, drops any key colliding with a reserved row column, and merges what's left onto the row as top-level columns. + +**Tech Stack:** Ruby 3.2.0, Minitest (no network — tests use fake doubles, not the Prism mock server), RuboCop, Sorbet (`rbi/`). + +Spec: `docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md` + +## Global Constraints + +- Ruby 3.2.0 pinned via `.ruby-version`. This worktree already has rbenv + Ruby 3.2.0 installed and `bundle install` run — `bundle exec ` works as-is. +- String literals are double-quoted (RuboCop `Style/StringLiterals`). +- Symbol arrays use bracket literals (`[:a, :b]`), not `%i[]` (RuboCop `Style/SymbolArray: EnforcedStyle: brackets`). +- Lint command: `bundle exec rubocop --except Lint/RedundantCopDisableDirective,Layout/LineLength ` (matches `rake lint:rubocop`; excludes are pre-existing repo policy, not specific to this change). +- Test command: `bundle exec rake test TEST=`. +- Typecheck command (only relevant once the RBI/examples change in Task 2): `bundle exec rake typecheck:sorbet`. +- Commit messages follow this repo's existing convention: `type(closes OPEN-11569): description` (e.g. see commit `9157c52`, which added the `session_id`/`user_id` precedent this plan follows). + +--- + +### Task 1: Core additional_columns support in the tracer + +**Files:** +- Modify: `lib/openlayer/integrations/google_conversational_search_tracer.rb` +- Test: `test/openlayer/integrations/google_conversational_search_tracer_test.rb` + +**Interfaces:** +- Consumes: nothing new — this task only touches the tracer file itself. +- Produces (used by Task 2's doc updates, and by any future caller): + - `GoogleConversationalSearchTracer.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {})` — new `additional_columns:` kwarg. + - The wrapped `client.answer_query(...)` accepts an extra `additional_columns:` kwarg, stripped before forwarding to the real Google client. + - `GoogleConversationalSearchTracer.send_trace(..., additional_columns: {}, call_additional_columns: {})` — two new kwargs (both public class method, per existing convention). + - `GoogleConversationalSearchTracer::RESERVED_ROW_KEYS` — `[:query, :answer, :latency_ms, :timestamp, :metadata, :steps, :context, :session_id, :user_id].freeze`. + +- [ ] **Step 1: Write the failing tests** + +Create `test/openlayer/integrations/google_conversational_search_tracer_test.rb`: + +```ruby +# frozen_string_literal: true + +require_relative "../test_helper" + +module Openlayer + module Test + module Integrations + end + end +end + +class Openlayer::Test::Integrations::GoogleConversationalSearchTracerTest < Minitest::Test + Tracer = Openlayer::Integrations::GoogleConversationalSearchTracer + + class FakeAnswer + attr_reader :answer_text + + def initialize(answer_text) + @answer_text = answer_text + end + end + + class FakeResponse + attr_reader :answer + + def initialize(answer_text) + @answer = FakeAnswer.new(answer_text) + end + end + + class FakeGoogleClient + def answer_query(serving_config:, query:) + FakeResponse.new("hi") + end + end + + class FakeDataResource + attr_reader :calls + + def initialize + @calls = [] + end + + def stream(inference_pipeline_id, **trace_data) + @calls << {inference_pipeline_id: inference_pipeline_id}.merge(trace_data) + end + end + + class FakeInferencePipelines + attr_reader :data + + def initialize(data) + @data = data + end + end + + class FakeOpenlayerClient + attr_reader :inference_pipelines + + def initialize + @data = FakeDataResource.new + @inference_pipelines = FakeInferencePipelines.new(@data) + end + + def last_row + @data.calls.last[:rows][0] + end + end + + def setup + @openlayer_client = FakeOpenlayerClient.new + @start_time = Time.now + @end_time = @start_time + 1 + end + + def trace_row(**overrides) + defaults = { + args: [], + kwargs: {query: "hello"}, + response: FakeResponse.new("hi"), + start_time: @start_time, + end_time: @end_time, + openlayer_client: @openlayer_client, + inference_pipeline_id: "pipeline-id" + } + + Tracer.send_trace(**defaults.merge(overrides)) + @openlayer_client.last_row + end + + def test_static_additional_columns_appear_on_the_row + row = trace_row(additional_columns: {environment: "production"}) + + assert_equal "production", row[:environment] + end + + def test_per_call_additional_columns_appear_on_the_row + row = trace_row(call_additional_columns: {trace_id: "abc-123"}) + + assert_equal "abc-123", row[:trace_id] + end + + def test_per_call_additional_columns_override_static_on_conflict + row = trace_row( + additional_columns: {trace_id: "static-value"}, + call_additional_columns: {trace_id: "call-value"} + ) + + assert_equal "call-value", row[:trace_id] + end + + def test_reserved_keys_are_dropped_even_as_string_keys + row = trace_row(additional_columns: {"answer" => "hijacked", trace_id: "abc-123"}) + + assert_equal "hi", row[:answer] + assert_equal "abc-123", row[:trace_id] + end + + def test_non_hash_additional_columns_does_not_raise + row = trace_row(additional_columns: "not-a-hash", call_additional_columns: nil) + + assert_equal "hi", row[:answer] + end + + def test_trace_client_strips_additional_columns_before_forwarding_to_google_client + google_client = FakeGoogleClient.new + + Tracer.trace_client( + google_client, + openlayer_client: @openlayer_client, + inference_pipeline_id: "pipeline-id" + ) + + response = google_client.answer_query( + serving_config: "config", + query: "hello", + additional_columns: {trace_id: "abc-123"} + ) + + assert_equal "hi", response.answer.answer_text + assert_equal "abc-123", @openlayer_client.last_row[:trace_id] + end +end +``` + +- [ ] **Step 2: Run the tests to verify they fail** + +Run: `bundle exec rake test TEST=./test/openlayer/integrations/google_conversational_search_tracer_test.rb` + +Expected: Errors (not just assertion failures) — `send_trace`/`trace_client` don't yet accept `additional_columns:`/`call_additional_columns:`, so Ruby raises `ArgumentError: unknown keyword: :additional_columns` (or similar) for every test. + +- [ ] **Step 3: Implement `additional_columns` support** + +In `lib/openlayer/integrations/google_conversational_search_tracer.rb`: + +3a. Add the reserved-keys constant right after the class declaration. Replace: + +```ruby + class GoogleConversationalSearchTracer + # Enable tracing on a Google ConversationalSearchService client +``` + +with: + +```ruby + class GoogleConversationalSearchTracer + # Row keys computed by this tracer. Any key in a caller-supplied + # additional_columns hash matching one of these is dropped, so custom + # data can never overwrite core trace fields. + RESERVED_ROW_KEYS = [:query, :answer, :latency_ms, :timestamp, :metadata, :steps, :context, :session_id, :user_id].freeze + + # Enable tracing on a Google ConversationalSearchService client +``` + +3b. Update the class-level YARD `@example` to show `additional_columns`. Replace: + +```ruby + # Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( + # google_client, + # openlayer_client: openlayer, + # inference_pipeline_id: 'your-pipeline-id' + # ) + # + # # Now all answer_query calls are automatically traced! + # response = google_client.answer_query( + # serving_config: "projects/.../servingConfigs/default", + # query: { text: "What is the meaning of life?" } + # ) +``` + +with: + +```ruby + # Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( + # google_client, + # openlayer_client: openlayer, + # inference_pipeline_id: 'your-pipeline-id', + # additional_columns: { environment: 'production' } + # ) + # + # # Now all answer_query calls are automatically traced! Pass + # # additional_columns on an individual call to attach data (like your + # # own trace ID) to just that row; it takes precedence over the + # # static defaults above on a key conflict. + # response = google_client.answer_query( + # serving_config: "projects/.../servingConfigs/default", + # query: { text: "What is the meaning of life?" }, + # additional_columns: { trace_id: "abc-123" } + # ) +``` + +3c. Update `trace_client`'s YARD doc + signature. Replace: + +```ruby + # @param session_id [String, nil] + # Optional session ID to use for all traces. Takes precedence over auto-extracted sessions. + # @param user_id [String, nil] + # Optional user ID to use for all traces. + # @return [void] + def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil) +``` + +with: + +```ruby + # @param session_id [String, nil] + # Optional session ID to use for all traces. Takes precedence over auto-extracted sessions. + # @param user_id [String, nil] + # Optional user ID to use for all traces. + # @param additional_columns [Hash, nil] + # Optional static column values merged into every trace sent through this client (e.g. `{ environment: 'production' }`). + # A value passed to an individual answer_query call takes precedence over these on a key conflict. Keys colliding + # with a reserved row column (query, answer, latency_ms, timestamp, metadata, steps, context, session_id, user_id) are dropped. + # @return [void] + def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {}) +``` + +3d. Update the wrapper block to intercept the per-call kwarg and pass both hashes through. Replace: + +```ruby + client.define_singleton_method(:answer_query) do |*args, **kwargs, &block| + # Capture start time + start_time = Time.now + + # Execute the original method + response = original_answer_query.call(*args, **kwargs, &block) + + # Capture end time + end_time = Time.now + + # Send trace to Openlayer (with error handling) + begin + GoogleConversationalSearchTracer.send_trace( + args: args, + kwargs: kwargs, + response: response, + start_time: start_time, + end_time: end_time, + openlayer_client: openlayer_client, + inference_pipeline_id: inference_pipeline_id, + session_id: session_id, + user_id: user_id + ) + rescue StandardError => e + # Never break the user's application due to tracing errors + GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] Failed to send trace: #{e.message}") + GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] #{e.backtrace.first(3).join("\n")}") if e.backtrace + end + + # Always return the original response + response + end +``` + +with: + +```ruby + client.define_singleton_method(:answer_query) do |*args, **kwargs, &block| + # Capture start time + start_time = Time.now + + # Extract per-call additional columns before forwarding to the + # real client; Google's client never sees this key + call_additional_columns = kwargs.delete(:additional_columns) + + # Execute the original method + response = original_answer_query.call(*args, **kwargs, &block) + + # Capture end time + end_time = Time.now + + # Send trace to Openlayer (with error handling) + begin + GoogleConversationalSearchTracer.send_trace( + args: args, + kwargs: kwargs, + response: response, + start_time: start_time, + end_time: end_time, + openlayer_client: openlayer_client, + inference_pipeline_id: inference_pipeline_id, + session_id: session_id, + user_id: user_id, + additional_columns: additional_columns, + call_additional_columns: call_additional_columns + ) + rescue StandardError => e + # Never break the user's application due to tracing errors + GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] Failed to send trace: #{e.message}") + GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] #{e.backtrace.first(3).join("\n")}") if e.backtrace + end + + # Always return the original response + response + end +``` + +3e. Update `send_trace`'s YARD doc + signature. Replace: + +```ruby + # @param session_id [String, nil] Optional session ID (takes precedence over auto-extracted) + # @param user_id [String, nil] Optional user ID + # @return [void] + def self.send_trace(args:, kwargs:, response:, start_time:, end_time:, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil) +``` + +with: + +```ruby + # @param session_id [String, nil] Optional session ID (takes precedence over auto-extracted) + # @param user_id [String, nil] Optional user ID + # @param additional_columns [Hash, nil] Optional static column values (see {.trace_client}) + # @param call_additional_columns [Hash, nil] Optional per-call column values; takes precedence over additional_columns + # @return [void] + def self.send_trace(args:, kwargs:, response:, start_time:, end_time:, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {}, call_additional_columns: {}) +``` + +3f. Insert the merge step right before the row is sent. Replace: + +```ruby + # Send to Openlayer + openlayer_client + .inference_pipelines + .data + .stream( + inference_pipeline_id, + **trace_data + ) + end +``` + +with: + +```ruby + # Merge additional columns (per-call values take precedence over + # static defaults; keys colliding with reserved row columns are + # dropped so custom data can never corrupt core trace fields) + extra_columns = resolve_additional_columns(additional_columns, call_additional_columns) + trace_data[:rows][0].merge!(extra_columns) unless extra_columns.empty? + + # Send to Openlayer + openlayer_client + .inference_pipelines + .data + .stream( + inference_pipeline_id, + **trace_data + ) + end +``` + +3g. Add the two new private helper methods. Insert them right after `extract_query_understanding_info` and before the `# Safely extract a field from an object` comment. Replace: + +```ruby + result.empty? ? nil : result + rescue StandardError => e + warn_if_debug("[Openlayer] Failed to extract query understanding info: #{e.message}") + nil + end + + # Safely extract a field from an object +``` + +with: + +```ruby + result.empty? ? nil : result + rescue StandardError => e + warn_if_debug("[Openlayer] Failed to extract query understanding info: #{e.message}") + nil + end + + # Merge static and per-call additional columns into a single Hash of + # extra row columns. Call-level values take precedence over static + # ones on key conflict, and any key colliding with a reserved row + # column is dropped. + # + # @param static_columns [Object] Value passed to trace_client (expected Hash) + # @param call_columns [Object] Value passed to an individual answer_query call (expected Hash) + # @return [Hash] Extra columns safe to merge onto a trace row + def self.resolve_additional_columns(static_columns, call_columns) + merged = normalize_additional_columns(static_columns).merge(normalize_additional_columns(call_columns)) + + merged.each_with_object({}) do |(key, value), result| + if RESERVED_ROW_KEYS.include?(key) + warn_if_debug("[Openlayer] additional_columns key :#{key} collides with a reserved column and was ignored") + else + result[key] = value + end + end + end + + # Normalize an additional_columns value into a Hash with Symbol keys. + # Non-Hash input (or a key that can't be a Symbol) is dropped rather + # than raising, so a caller mistake can never break tracing. + # + # @param columns [Object] Expected to be a Hash of column name => value + # @return [Hash] + def self.normalize_additional_columns(columns) + return {} unless columns.is_a?(Hash) + + columns.each_with_object({}) do |(key, value), result| + next unless key.respond_to?(:to_sym) + + result[key.to_sym] = value + end + rescue StandardError => e + warn_if_debug("[Openlayer] Failed to normalize additional columns: #{e.message}") + {} + end + + # Safely extract a field from an object +``` + +3h. Add the two new methods to the `private_class_method` list. Replace: + +```ruby + private_class_method :extract_query, + :extract_answer_data, + :extract_citations, + :extract_citation_sources, + :extract_references, + :extract_related_questions, + :extract_steps, + :extract_step_data, + :extract_search_results, + :extract_metadata, + :extract_serving_config, + :extract_session, + :extract_user_pseudo_id, + :extract_query_understanding_info, + :safe_extract, + :safe_count, + :extract_timestamp +``` + +with: + +```ruby + private_class_method :extract_query, + :extract_answer_data, + :extract_citations, + :extract_citation_sources, + :extract_references, + :extract_related_questions, + :extract_steps, + :extract_step_data, + :extract_search_results, + :extract_metadata, + :extract_serving_config, + :extract_session, + :extract_user_pseudo_id, + :extract_query_understanding_info, + :resolve_additional_columns, + :normalize_additional_columns, + :safe_extract, + :safe_count, + :extract_timestamp +``` + +- [ ] **Step 4: Run the tests to verify they pass** + +Run: `bundle exec rake test TEST=./test/openlayer/integrations/google_conversational_search_tracer_test.rb` + +Expected: `6 runs, ... assertions, 0 failures, 0 errors, 0 skips` + +- [ ] **Step 5: Lint the changed files** + +Run: `bundle exec rubocop --except Lint/RedundantCopDisableDirective,Layout/LineLength lib/openlayer/integrations/google_conversational_search_tracer.rb test/openlayer/integrations/google_conversational_search_tracer_test.rb` + +Expected: `2 files inspected, no offenses detected` + +- [ ] **Step 6: Commit** + +```bash +git add lib/openlayer/integrations/google_conversational_search_tracer.rb test/openlayer/integrations/google_conversational_search_tracer_test.rb +git commit -m "feat(closes OPEN-11569): support additional columns in ConversationalSearchService tracer" +``` + +--- + +### Task 2: RBI signature and example script + +**Files:** +- Modify: `rbi/openlayer/integrations.rbi` +- Modify: `examples/google_tracer.rb` + +**Interfaces:** +- Consumes: `GoogleConversationalSearchTracer.trace_client(..., additional_columns: {})` and the per-call `additional_columns:` kwarg on `answer_query`, both from Task 1. +- Produces: nothing further downstream — this is the last task. + +- [ ] **Step 1: Update the RBI signature** + +Replace the full contents of `rbi/openlayer/integrations.rbi`: + +```ruby +# typed: strong +# frozen_string_literal: true + +module Openlayer + module Integrations + class GoogleConversationalSearchTracer + sig do + params( + client: T.untyped, + openlayer_client: Openlayer::Client, + inference_pipeline_id: String, + session_id: T.nilable(String), + user_id: T.nilable(String), + additional_columns: T::Hash[Symbol, T.untyped] + ).void + end + def self.trace_client( + client, + openlayer_client:, + inference_pipeline_id:, + session_id: nil, + user_id: nil, + additional_columns: {} + ) + end + end + end +end +``` + +- [ ] **Step 2: Update the example script** + +Replace the full contents of `examples/google_tracer.rb`: + +```ruby +#!/usr/bin/env ruby +# frozen_string_literal: true +# typed: false + +# Simple example: Tracing Google ConversationalSearchService with Openlayer + +# Add lib directory to load path +$LOAD_PATH.unshift(File.expand_path("../lib", __dir__)) + +require "securerandom" +require "openlayer" +require "openlayer/integrations/google_conversational_search_tracer" +require "google/cloud/discovery_engine/v1" + +# Initialize Google ConversationalSearchService client +google_client = Google::Cloud::DiscoveryEngine::V1::ConversationalSearchService::Client.new + +# Initialize Openlayer client +openlayer = Openlayer::Client.new( + api_key: ENV["OPENLAYER_API_KEY"] +) + +# Enable tracing - this patches the client to send all queries to Openlayer. +# additional_columns here is a static default applied to every trace sent +# through this client. +Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( + google_client, + openlayer_client: openlayer, + inference_pipeline_id: ENV["OPENLAYER_INFERENCE_PIPELINE_ID"], + additional_columns: {environment: "production"} +) + +# Use the client normally - all answer_query calls are now automatically traced! +# additional_columns here is per-call; it takes precedence over the static +# default above on a key conflict. +response = google_client.answer_query( + serving_config: ENV["GOOGLE_SERVING_CONFIG"], + query: Google::Cloud::DiscoveryEngine::V1::Query.new( + text: "What is the meaning of life?" + ), + additional_columns: {trace_id: SecureRandom.uuid} +) + +puts "Answer: #{response.answer.answer_text}" +puts "\n✓ Query traced to Openlayer successfully!" +``` + +- [ ] **Step 3: Typecheck the examples directory** + +Run: `bundle exec rake typecheck:sorbet` + +Expected: `No errors! Great job.` + +- [ ] **Step 4: Lint the changed files** + +Run: `bundle exec rubocop --except Lint/RedundantCopDisableDirective,Layout/LineLength rbi/openlayer/integrations.rbi examples/google_tracer.rb` + +Expected: `2 files inspected, no offenses detected` + +- [ ] **Step 5: Run the full test suite as a final regression check** + +Run: `bundle exec rake test TEST=./test/openlayer/integrations/google_conversational_search_tracer_test.rb` + +Expected: `6 runs, ... assertions, 0 failures, 0 errors, 0 skips` (unchanged from Task 1 — this task touches no runtime logic, only types/docs/examples). + +- [ ] **Step 6: Commit** + +```bash +git add rbi/openlayer/integrations.rbi examples/google_tracer.rb +git commit -m "docs(closes OPEN-11569): document additional columns support in ConversationalSearchService tracer example and RBI sig" +``` From 0e1ff58cf992c1fa7c5b640fe0af65a6d7a66032 Mon Sep 17 00:00:00 2001 From: Vinicius Mello Date: Fri, 3 Jul 2026 17:39:02 -0300 Subject: [PATCH 5/5] chore: remove planning docs from the branch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drop the spec and implementation-plan working docs — not intended to live in the repo's docs/ tree. --- ...-07-03-google-tracer-additional-columns.md | 665 ------------------ ...google-tracer-additional-columns-design.md | 79 --- 2 files changed, 744 deletions(-) delete mode 100644 docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md delete mode 100644 docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md diff --git a/docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md b/docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md deleted file mode 100644 index bfd27f1..0000000 --- a/docs/superpowers/plans/2026-07-03-google-tracer-additional-columns.md +++ /dev/null @@ -1,665 +0,0 @@ -# Additional Columns in GoogleConversationalSearchTracer Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Let callers of `Openlayer::Integrations::GoogleConversationalSearchTracer` attach their own custom columns (e.g. a trace ID) to traced rows, both as a static default per client and as a per-call override. - -**Architecture:** Add an `additional_columns:` kwarg to `trace_client` (static defaults) and intercept a same-named reserved kwarg on the wrapped `answer_query` call (per-call values) before it reaches the real Google client. `send_trace` normalizes both hashes to Symbol keys, merges per-call over static, drops any key colliding with a reserved row column, and merges what's left onto the row as top-level columns. - -**Tech Stack:** Ruby 3.2.0, Minitest (no network — tests use fake doubles, not the Prism mock server), RuboCop, Sorbet (`rbi/`). - -Spec: `docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md` - -## Global Constraints - -- Ruby 3.2.0 pinned via `.ruby-version`. This worktree already has rbenv + Ruby 3.2.0 installed and `bundle install` run — `bundle exec ` works as-is. -- String literals are double-quoted (RuboCop `Style/StringLiterals`). -- Symbol arrays use bracket literals (`[:a, :b]`), not `%i[]` (RuboCop `Style/SymbolArray: EnforcedStyle: brackets`). -- Lint command: `bundle exec rubocop --except Lint/RedundantCopDisableDirective,Layout/LineLength ` (matches `rake lint:rubocop`; excludes are pre-existing repo policy, not specific to this change). -- Test command: `bundle exec rake test TEST=`. -- Typecheck command (only relevant once the RBI/examples change in Task 2): `bundle exec rake typecheck:sorbet`. -- Commit messages follow this repo's existing convention: `type(closes OPEN-11569): description` (e.g. see commit `9157c52`, which added the `session_id`/`user_id` precedent this plan follows). - ---- - -### Task 1: Core additional_columns support in the tracer - -**Files:** -- Modify: `lib/openlayer/integrations/google_conversational_search_tracer.rb` -- Test: `test/openlayer/integrations/google_conversational_search_tracer_test.rb` - -**Interfaces:** -- Consumes: nothing new — this task only touches the tracer file itself. -- Produces (used by Task 2's doc updates, and by any future caller): - - `GoogleConversationalSearchTracer.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {})` — new `additional_columns:` kwarg. - - The wrapped `client.answer_query(...)` accepts an extra `additional_columns:` kwarg, stripped before forwarding to the real Google client. - - `GoogleConversationalSearchTracer.send_trace(..., additional_columns: {}, call_additional_columns: {})` — two new kwargs (both public class method, per existing convention). - - `GoogleConversationalSearchTracer::RESERVED_ROW_KEYS` — `[:query, :answer, :latency_ms, :timestamp, :metadata, :steps, :context, :session_id, :user_id].freeze`. - -- [ ] **Step 1: Write the failing tests** - -Create `test/openlayer/integrations/google_conversational_search_tracer_test.rb`: - -```ruby -# frozen_string_literal: true - -require_relative "../test_helper" - -module Openlayer - module Test - module Integrations - end - end -end - -class Openlayer::Test::Integrations::GoogleConversationalSearchTracerTest < Minitest::Test - Tracer = Openlayer::Integrations::GoogleConversationalSearchTracer - - class FakeAnswer - attr_reader :answer_text - - def initialize(answer_text) - @answer_text = answer_text - end - end - - class FakeResponse - attr_reader :answer - - def initialize(answer_text) - @answer = FakeAnswer.new(answer_text) - end - end - - class FakeGoogleClient - def answer_query(serving_config:, query:) - FakeResponse.new("hi") - end - end - - class FakeDataResource - attr_reader :calls - - def initialize - @calls = [] - end - - def stream(inference_pipeline_id, **trace_data) - @calls << {inference_pipeline_id: inference_pipeline_id}.merge(trace_data) - end - end - - class FakeInferencePipelines - attr_reader :data - - def initialize(data) - @data = data - end - end - - class FakeOpenlayerClient - attr_reader :inference_pipelines - - def initialize - @data = FakeDataResource.new - @inference_pipelines = FakeInferencePipelines.new(@data) - end - - def last_row - @data.calls.last[:rows][0] - end - end - - def setup - @openlayer_client = FakeOpenlayerClient.new - @start_time = Time.now - @end_time = @start_time + 1 - end - - def trace_row(**overrides) - defaults = { - args: [], - kwargs: {query: "hello"}, - response: FakeResponse.new("hi"), - start_time: @start_time, - end_time: @end_time, - openlayer_client: @openlayer_client, - inference_pipeline_id: "pipeline-id" - } - - Tracer.send_trace(**defaults.merge(overrides)) - @openlayer_client.last_row - end - - def test_static_additional_columns_appear_on_the_row - row = trace_row(additional_columns: {environment: "production"}) - - assert_equal "production", row[:environment] - end - - def test_per_call_additional_columns_appear_on_the_row - row = trace_row(call_additional_columns: {trace_id: "abc-123"}) - - assert_equal "abc-123", row[:trace_id] - end - - def test_per_call_additional_columns_override_static_on_conflict - row = trace_row( - additional_columns: {trace_id: "static-value"}, - call_additional_columns: {trace_id: "call-value"} - ) - - assert_equal "call-value", row[:trace_id] - end - - def test_reserved_keys_are_dropped_even_as_string_keys - row = trace_row(additional_columns: {"answer" => "hijacked", trace_id: "abc-123"}) - - assert_equal "hi", row[:answer] - assert_equal "abc-123", row[:trace_id] - end - - def test_non_hash_additional_columns_does_not_raise - row = trace_row(additional_columns: "not-a-hash", call_additional_columns: nil) - - assert_equal "hi", row[:answer] - end - - def test_trace_client_strips_additional_columns_before_forwarding_to_google_client - google_client = FakeGoogleClient.new - - Tracer.trace_client( - google_client, - openlayer_client: @openlayer_client, - inference_pipeline_id: "pipeline-id" - ) - - response = google_client.answer_query( - serving_config: "config", - query: "hello", - additional_columns: {trace_id: "abc-123"} - ) - - assert_equal "hi", response.answer.answer_text - assert_equal "abc-123", @openlayer_client.last_row[:trace_id] - end -end -``` - -- [ ] **Step 2: Run the tests to verify they fail** - -Run: `bundle exec rake test TEST=./test/openlayer/integrations/google_conversational_search_tracer_test.rb` - -Expected: Errors (not just assertion failures) — `send_trace`/`trace_client` don't yet accept `additional_columns:`/`call_additional_columns:`, so Ruby raises `ArgumentError: unknown keyword: :additional_columns` (or similar) for every test. - -- [ ] **Step 3: Implement `additional_columns` support** - -In `lib/openlayer/integrations/google_conversational_search_tracer.rb`: - -3a. Add the reserved-keys constant right after the class declaration. Replace: - -```ruby - class GoogleConversationalSearchTracer - # Enable tracing on a Google ConversationalSearchService client -``` - -with: - -```ruby - class GoogleConversationalSearchTracer - # Row keys computed by this tracer. Any key in a caller-supplied - # additional_columns hash matching one of these is dropped, so custom - # data can never overwrite core trace fields. - RESERVED_ROW_KEYS = [:query, :answer, :latency_ms, :timestamp, :metadata, :steps, :context, :session_id, :user_id].freeze - - # Enable tracing on a Google ConversationalSearchService client -``` - -3b. Update the class-level YARD `@example` to show `additional_columns`. Replace: - -```ruby - # Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( - # google_client, - # openlayer_client: openlayer, - # inference_pipeline_id: 'your-pipeline-id' - # ) - # - # # Now all answer_query calls are automatically traced! - # response = google_client.answer_query( - # serving_config: "projects/.../servingConfigs/default", - # query: { text: "What is the meaning of life?" } - # ) -``` - -with: - -```ruby - # Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( - # google_client, - # openlayer_client: openlayer, - # inference_pipeline_id: 'your-pipeline-id', - # additional_columns: { environment: 'production' } - # ) - # - # # Now all answer_query calls are automatically traced! Pass - # # additional_columns on an individual call to attach data (like your - # # own trace ID) to just that row; it takes precedence over the - # # static defaults above on a key conflict. - # response = google_client.answer_query( - # serving_config: "projects/.../servingConfigs/default", - # query: { text: "What is the meaning of life?" }, - # additional_columns: { trace_id: "abc-123" } - # ) -``` - -3c. Update `trace_client`'s YARD doc + signature. Replace: - -```ruby - # @param session_id [String, nil] - # Optional session ID to use for all traces. Takes precedence over auto-extracted sessions. - # @param user_id [String, nil] - # Optional user ID to use for all traces. - # @return [void] - def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil) -``` - -with: - -```ruby - # @param session_id [String, nil] - # Optional session ID to use for all traces. Takes precedence over auto-extracted sessions. - # @param user_id [String, nil] - # Optional user ID to use for all traces. - # @param additional_columns [Hash, nil] - # Optional static column values merged into every trace sent through this client (e.g. `{ environment: 'production' }`). - # A value passed to an individual answer_query call takes precedence over these on a key conflict. Keys colliding - # with a reserved row column (query, answer, latency_ms, timestamp, metadata, steps, context, session_id, user_id) are dropped. - # @return [void] - def self.trace_client(client, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {}) -``` - -3d. Update the wrapper block to intercept the per-call kwarg and pass both hashes through. Replace: - -```ruby - client.define_singleton_method(:answer_query) do |*args, **kwargs, &block| - # Capture start time - start_time = Time.now - - # Execute the original method - response = original_answer_query.call(*args, **kwargs, &block) - - # Capture end time - end_time = Time.now - - # Send trace to Openlayer (with error handling) - begin - GoogleConversationalSearchTracer.send_trace( - args: args, - kwargs: kwargs, - response: response, - start_time: start_time, - end_time: end_time, - openlayer_client: openlayer_client, - inference_pipeline_id: inference_pipeline_id, - session_id: session_id, - user_id: user_id - ) - rescue StandardError => e - # Never break the user's application due to tracing errors - GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] Failed to send trace: #{e.message}") - GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] #{e.backtrace.first(3).join("\n")}") if e.backtrace - end - - # Always return the original response - response - end -``` - -with: - -```ruby - client.define_singleton_method(:answer_query) do |*args, **kwargs, &block| - # Capture start time - start_time = Time.now - - # Extract per-call additional columns before forwarding to the - # real client; Google's client never sees this key - call_additional_columns = kwargs.delete(:additional_columns) - - # Execute the original method - response = original_answer_query.call(*args, **kwargs, &block) - - # Capture end time - end_time = Time.now - - # Send trace to Openlayer (with error handling) - begin - GoogleConversationalSearchTracer.send_trace( - args: args, - kwargs: kwargs, - response: response, - start_time: start_time, - end_time: end_time, - openlayer_client: openlayer_client, - inference_pipeline_id: inference_pipeline_id, - session_id: session_id, - user_id: user_id, - additional_columns: additional_columns, - call_additional_columns: call_additional_columns - ) - rescue StandardError => e - # Never break the user's application due to tracing errors - GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] Failed to send trace: #{e.message}") - GoogleConversationalSearchTracer.warn_if_debug("[Openlayer] #{e.backtrace.first(3).join("\n")}") if e.backtrace - end - - # Always return the original response - response - end -``` - -3e. Update `send_trace`'s YARD doc + signature. Replace: - -```ruby - # @param session_id [String, nil] Optional session ID (takes precedence over auto-extracted) - # @param user_id [String, nil] Optional user ID - # @return [void] - def self.send_trace(args:, kwargs:, response:, start_time:, end_time:, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil) -``` - -with: - -```ruby - # @param session_id [String, nil] Optional session ID (takes precedence over auto-extracted) - # @param user_id [String, nil] Optional user ID - # @param additional_columns [Hash, nil] Optional static column values (see {.trace_client}) - # @param call_additional_columns [Hash, nil] Optional per-call column values; takes precedence over additional_columns - # @return [void] - def self.send_trace(args:, kwargs:, response:, start_time:, end_time:, openlayer_client:, inference_pipeline_id:, session_id: nil, user_id: nil, additional_columns: {}, call_additional_columns: {}) -``` - -3f. Insert the merge step right before the row is sent. Replace: - -```ruby - # Send to Openlayer - openlayer_client - .inference_pipelines - .data - .stream( - inference_pipeline_id, - **trace_data - ) - end -``` - -with: - -```ruby - # Merge additional columns (per-call values take precedence over - # static defaults; keys colliding with reserved row columns are - # dropped so custom data can never corrupt core trace fields) - extra_columns = resolve_additional_columns(additional_columns, call_additional_columns) - trace_data[:rows][0].merge!(extra_columns) unless extra_columns.empty? - - # Send to Openlayer - openlayer_client - .inference_pipelines - .data - .stream( - inference_pipeline_id, - **trace_data - ) - end -``` - -3g. Add the two new private helper methods. Insert them right after `extract_query_understanding_info` and before the `# Safely extract a field from an object` comment. Replace: - -```ruby - result.empty? ? nil : result - rescue StandardError => e - warn_if_debug("[Openlayer] Failed to extract query understanding info: #{e.message}") - nil - end - - # Safely extract a field from an object -``` - -with: - -```ruby - result.empty? ? nil : result - rescue StandardError => e - warn_if_debug("[Openlayer] Failed to extract query understanding info: #{e.message}") - nil - end - - # Merge static and per-call additional columns into a single Hash of - # extra row columns. Call-level values take precedence over static - # ones on key conflict, and any key colliding with a reserved row - # column is dropped. - # - # @param static_columns [Object] Value passed to trace_client (expected Hash) - # @param call_columns [Object] Value passed to an individual answer_query call (expected Hash) - # @return [Hash] Extra columns safe to merge onto a trace row - def self.resolve_additional_columns(static_columns, call_columns) - merged = normalize_additional_columns(static_columns).merge(normalize_additional_columns(call_columns)) - - merged.each_with_object({}) do |(key, value), result| - if RESERVED_ROW_KEYS.include?(key) - warn_if_debug("[Openlayer] additional_columns key :#{key} collides with a reserved column and was ignored") - else - result[key] = value - end - end - end - - # Normalize an additional_columns value into a Hash with Symbol keys. - # Non-Hash input (or a key that can't be a Symbol) is dropped rather - # than raising, so a caller mistake can never break tracing. - # - # @param columns [Object] Expected to be a Hash of column name => value - # @return [Hash] - def self.normalize_additional_columns(columns) - return {} unless columns.is_a?(Hash) - - columns.each_with_object({}) do |(key, value), result| - next unless key.respond_to?(:to_sym) - - result[key.to_sym] = value - end - rescue StandardError => e - warn_if_debug("[Openlayer] Failed to normalize additional columns: #{e.message}") - {} - end - - # Safely extract a field from an object -``` - -3h. Add the two new methods to the `private_class_method` list. Replace: - -```ruby - private_class_method :extract_query, - :extract_answer_data, - :extract_citations, - :extract_citation_sources, - :extract_references, - :extract_related_questions, - :extract_steps, - :extract_step_data, - :extract_search_results, - :extract_metadata, - :extract_serving_config, - :extract_session, - :extract_user_pseudo_id, - :extract_query_understanding_info, - :safe_extract, - :safe_count, - :extract_timestamp -``` - -with: - -```ruby - private_class_method :extract_query, - :extract_answer_data, - :extract_citations, - :extract_citation_sources, - :extract_references, - :extract_related_questions, - :extract_steps, - :extract_step_data, - :extract_search_results, - :extract_metadata, - :extract_serving_config, - :extract_session, - :extract_user_pseudo_id, - :extract_query_understanding_info, - :resolve_additional_columns, - :normalize_additional_columns, - :safe_extract, - :safe_count, - :extract_timestamp -``` - -- [ ] **Step 4: Run the tests to verify they pass** - -Run: `bundle exec rake test TEST=./test/openlayer/integrations/google_conversational_search_tracer_test.rb` - -Expected: `6 runs, ... assertions, 0 failures, 0 errors, 0 skips` - -- [ ] **Step 5: Lint the changed files** - -Run: `bundle exec rubocop --except Lint/RedundantCopDisableDirective,Layout/LineLength lib/openlayer/integrations/google_conversational_search_tracer.rb test/openlayer/integrations/google_conversational_search_tracer_test.rb` - -Expected: `2 files inspected, no offenses detected` - -- [ ] **Step 6: Commit** - -```bash -git add lib/openlayer/integrations/google_conversational_search_tracer.rb test/openlayer/integrations/google_conversational_search_tracer_test.rb -git commit -m "feat(closes OPEN-11569): support additional columns in ConversationalSearchService tracer" -``` - ---- - -### Task 2: RBI signature and example script - -**Files:** -- Modify: `rbi/openlayer/integrations.rbi` -- Modify: `examples/google_tracer.rb` - -**Interfaces:** -- Consumes: `GoogleConversationalSearchTracer.trace_client(..., additional_columns: {})` and the per-call `additional_columns:` kwarg on `answer_query`, both from Task 1. -- Produces: nothing further downstream — this is the last task. - -- [ ] **Step 1: Update the RBI signature** - -Replace the full contents of `rbi/openlayer/integrations.rbi`: - -```ruby -# typed: strong -# frozen_string_literal: true - -module Openlayer - module Integrations - class GoogleConversationalSearchTracer - sig do - params( - client: T.untyped, - openlayer_client: Openlayer::Client, - inference_pipeline_id: String, - session_id: T.nilable(String), - user_id: T.nilable(String), - additional_columns: T::Hash[Symbol, T.untyped] - ).void - end - def self.trace_client( - client, - openlayer_client:, - inference_pipeline_id:, - session_id: nil, - user_id: nil, - additional_columns: {} - ) - end - end - end -end -``` - -- [ ] **Step 2: Update the example script** - -Replace the full contents of `examples/google_tracer.rb`: - -```ruby -#!/usr/bin/env ruby -# frozen_string_literal: true -# typed: false - -# Simple example: Tracing Google ConversationalSearchService with Openlayer - -# Add lib directory to load path -$LOAD_PATH.unshift(File.expand_path("../lib", __dir__)) - -require "securerandom" -require "openlayer" -require "openlayer/integrations/google_conversational_search_tracer" -require "google/cloud/discovery_engine/v1" - -# Initialize Google ConversationalSearchService client -google_client = Google::Cloud::DiscoveryEngine::V1::ConversationalSearchService::Client.new - -# Initialize Openlayer client -openlayer = Openlayer::Client.new( - api_key: ENV["OPENLAYER_API_KEY"] -) - -# Enable tracing - this patches the client to send all queries to Openlayer. -# additional_columns here is a static default applied to every trace sent -# through this client. -Openlayer::Integrations::GoogleConversationalSearchTracer.trace_client( - google_client, - openlayer_client: openlayer, - inference_pipeline_id: ENV["OPENLAYER_INFERENCE_PIPELINE_ID"], - additional_columns: {environment: "production"} -) - -# Use the client normally - all answer_query calls are now automatically traced! -# additional_columns here is per-call; it takes precedence over the static -# default above on a key conflict. -response = google_client.answer_query( - serving_config: ENV["GOOGLE_SERVING_CONFIG"], - query: Google::Cloud::DiscoveryEngine::V1::Query.new( - text: "What is the meaning of life?" - ), - additional_columns: {trace_id: SecureRandom.uuid} -) - -puts "Answer: #{response.answer.answer_text}" -puts "\n✓ Query traced to Openlayer successfully!" -``` - -- [ ] **Step 3: Typecheck the examples directory** - -Run: `bundle exec rake typecheck:sorbet` - -Expected: `No errors! Great job.` - -- [ ] **Step 4: Lint the changed files** - -Run: `bundle exec rubocop --except Lint/RedundantCopDisableDirective,Layout/LineLength rbi/openlayer/integrations.rbi examples/google_tracer.rb` - -Expected: `2 files inspected, no offenses detected` - -- [ ] **Step 5: Run the full test suite as a final regression check** - -Run: `bundle exec rake test TEST=./test/openlayer/integrations/google_conversational_search_tracer_test.rb` - -Expected: `6 runs, ... assertions, 0 failures, 0 errors, 0 skips` (unchanged from Task 1 — this task touches no runtime logic, only types/docs/examples). - -- [ ] **Step 6: Commit** - -```bash -git add rbi/openlayer/integrations.rbi examples/google_tracer.rb -git commit -m "docs(closes OPEN-11569): document additional columns support in ConversationalSearchService tracer example and RBI sig" -``` diff --git a/docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md b/docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md deleted file mode 100644 index 829b716..0000000 --- a/docs/superpowers/specs/2026-07-03-google-tracer-additional-columns-design.md +++ /dev/null @@ -1,79 +0,0 @@ -# Support additional columns in GoogleConversationalSearchTracer - -- Linear: [OPEN-11569](https://linear.app/openlayer/issue/OPEN-11569/support-additional-columns-in-google-conversational-search-tracer) -- Customer: gen (per Vikas Nair, Slack thread linked on the issue) - -## Problem - -`Openlayer::Integrations::GoogleConversationalSearchTracer` (`lib/openlayer/integrations/google_conversational_search_tracer.rb`) auto-extracts a fixed set of fields from each `answer_query` call and response (query, answer, latency, citations, references, session, user, etc.) but gives the caller no way to attach their own custom data — e.g. a trace ID generated by their own application — to the row sent to Openlayer. - -The Openlayer stream API already accepts arbitrary extra keys per row (`DataStreamParams#rows` is typed `ArrayObject}>`, `lib/openlayer/models/inference_pipelines/data_stream_params.rb:23-28`) with no schema restriction, and no `config` registration is required for non-special columns. The gap is entirely in the tracer's Ruby API, not the wire format. - -## Precedent - -- `session_id`/`user_id` were added to this same tracer in commit `9157c52` (OPEN-8574) as static kwargs on `trace_client`, applied to every trace sent through that client instance. -- The Python SDK's tracers (`trace_openai`, `trace_anthropic`, etc.) let callers attach arbitrary data via `update_current_trace(metadata={...})`; in `post_process_trace()` every key in that dict is flattened directly onto the row as a top-level column (`src/openlayer/lib/tracing/tracer.py` in openlayer-python), not nested under a `metadata` key. Only truly special columns get a `config[...ColumnName]` entry. -- This tracer's existing `metadata:` row key already has an established, different meaning: it's the tracer's own auto-extracted Google-specific data (citations, references, provider, service, etc.). The new feature must not collide with or overload that name. - -## Design - -### API - -`trace_client` gains a new optional `additional_columns:` kwarg — a `Hash` of column name => value, applied as a static default to every trace sent through that client: - -```ruby -GoogleConversationalSearchTracer.trace_client( - google_client, - openlayer_client: openlayer, - inference_pipeline_id: pipeline_id, - additional_columns: { environment: "production", app_version: "2.4.1" } -) -``` - -The wrapped `answer_query` method also recognizes `additional_columns:` as a reserved keyword argument, for per-call values: - -```ruby -google_client.answer_query( - serving_config: serving_config, - query: query, - additional_columns: { trace_id: my_app.generate_trace_id } -) -``` - -The wrapper (`client.define_singleton_method(:answer_query) do |*args, **kwargs, &block| ... end`) pops `:additional_columns` out of `kwargs` before forwarding the rest to the real Google client via `original_answer_query.call(*args, **kwargs, &block)`. Google's client never sees this key, so it cannot collide with any real Discovery Engine request field (confirmed against `rbi/google_discovery_engine.rbi`, which already types `answer_query`'s extra kwargs as `T.untyped`). - -### Merge & precedence - -1. Keys in both the static and per-call `additional_columns` hashes are normalized to Symbols (`key.to_sym`) before anything else. This is required for the reserved-key filter below to work regardless of whether the caller used String or Symbol keys — without normalization, a String `"answer"` wouldn't match a Symbol `:answer` in the reserved list and could slip through and clobber the real answer column. -2. Per-call `additional_columns` are shallow-merged over the client-level static `additional_columns` (call-level wins on key conflict). -3. From that combined hash, any key matching a reserved/built-in row key is dropped: `:query`, `:answer`, `:latency_ms`, `:timestamp`, `:metadata`, `:steps`, `:context`, `:session_id`, `:user_id`. A dropped key logs via `warn_if_debug` (only visible when `OPENLAYER_DEBUG` is set), so this is silent by default but debuggable. -4. The remaining keys are merged onto `trace_data[:rows][0]` as top-level columns (sibling of `query`/`answer`/etc.) — no `config` registration needed, matching the "arbitrary row keys pass through" contract confirmed in `data_stream_params.rb`. - -Built-ins are computed first; the filtered additional columns are merged in afterward but with reserved keys already stripped, so they can never overwrite core trace data — this is what "built-ins win" means in practice, without needing a specific merge order to enforce it. - -### Robustness - -If `additional_columns` (static or per-call) is not a `Hash` (nil, String, etc.), treat it as `{}` and log via `warn_if_debug` — do not raise. This matches the file's existing philosophy that a tracing mistake must never break the customer's actual application (every `send_trace` call site is already wrapped in `rescue StandardError`). - -### Docs - -- Update the class-level YARD `@example` on `GoogleConversationalSearchTracer`. -- Update `examples/google_tracer.rb` to demonstrate both static and per-call `additional_columns`. -- Update `@param` docs on `trace_client` and `send_trace`. -- Update `rbi/openlayer/integrations.rbi` (the OPEN-8574 commit updated this same file when it added `session_id`/`user_id`). - -### Testing - -No test coverage exists today for this tracer. Add `test/openlayer/integrations/google_conversational_search_tracer_test.rb` as a plain `Minitest::Test` (no network — a fake `openlayer_client` double capturing the `.inference_pipelines.data.stream(...)` call args), covering: - -- Static `additional_columns` (set via `trace_client`) appear as top-level row columns. -- Per-call `additional_columns` (passed to `answer_query`) appear as top-level row columns. -- Per-call values override static values on key conflict. -- Keys colliding with reserved row columns are dropped, not merged — including when passed as a String key (e.g. `"answer"`) rather than a Symbol. -- A non-`Hash` `additional_columns` value doesn't raise. - -## Out of scope - -- No change to `lib/openlayer/models`/`lib/openlayer/resources` (Stainless-generated; the wire format already supports arbitrary row keys, confirmed via `.stats.yml` codegen fingerprint — these directories are regenerated from an OpenAPI spec and must not be hand-edited). -- No thread-local/context-manager mechanism (rejected approach B — unnecessary complexity for this need). -- No new public method alongside `answer_query` (rejected approach C — breaks the tracer's transparent-instrumentation model).