Narrow queried indices based on `_typename` filters by marcdaniels-toast · Pull Request #1229 · block/elasticgraph

marcdaniels-toast · 2026-06-02T03:48:59Z

When a query on an abstract type filters on __typename, we can skip any index that contains none of the requested concrete types. For example, querying DistributionChannel with _typename: PhysicalStore only needs to hit physical_stores, not distribution_channels.

~~TypenameIndexPicker implements this using FilterValueSetExtractor for full set-algebra support across not, any_of, and all_of filter combinators.~~

Update: #1239 extracted EqualityValueSet and created a FilterValueSetExtractor.for_equality factory. Now DatastoreQuery via TypenameFilter determines the subset of types matching the query's filters and narrows to only those types' indices.

One design decision worth noting: DatastoreQuery overrides the generated Data accessor for search_index_definitions so the narrowing is transparent to all callers, using super to get the raw stored value. The alternative, a separate narrowed_search_index_definitions method, would leave callers with two similar methods to choose from.

Update: DatastoreQuery exposes two index definition accessors: initial_search_index_definitions (the set provided at construction time) and narrowed_search_index_definitions. All internal methods that determine what to search use narrowed_search_index_definitions, making the narrowing transparent to callers. initial_search_index_definitions is retained as a named concept for the rare case where the pre-narrowing set is needed.

Doc check

I wanted to be sure the YARD documentation of initial_search_index_definitions worked properly. Here's a screenshot:

myronmarston

Nice work @marcdaniels-toast! Left some suggestions.

When a query includes a `__typename` filter with `equal_to_any_of`, `DatastoreQuery#narrowed_search_index_definitions` now intersects the full set of initial index definitions with only the indices that can contain the filtered type names. This avoids searching indices that cannot possibly contain matching documents. For example, filtering `distributionChannels` to `PhysicalStore` will now search only `physical_stores`, not `distribution_channels` as well. Generated with Claude Code

Generated with Claude Code

marcdaniels-toast · 2026-06-06T03:25:41Z

+
+module ElasticGraph
+  class GraphQL
+    module Filtering


I considered locating this small class in the DatastoreQuery namespace but thought it might better belong here because unlike RoutingPicker it doesn't directly involve any DatastoreQuery concepts I don't think. And it is a Filter class with "Filter" in the name, so this seems like the right home for it.

marcdaniels-toast · 2026-06-06T03:26:36Z

+      # @!attribute [r] initial_search_index_definitions
+      #   The index definitions provided at construction time, before any `__typename`-based narrowing.


See PR description for a screenshot of this.

myronmarston · 2026-06-07T20:58:09Z

-          default_page_size max_page_size schema_element_names
+          logger filter_interpreter routing_picker typename_filter index_definitions_by_type_name
+          index_expression_builder default_page_size max_page_size schema_element_names
        ]


We should only add an attribute to app_level_attributes if it's handled by DatastoreQuery::Builder to ensure that every DatastoreQuery instance has the same value. But you've implemented index_definitions_by_type_name so that it's an argument to new_query which means that two DatastoreQuery instances can have different values! And in fact, they often do, because it defaults to {} which isn't the correct, accurate value for that attribute.

That made me realize an improvement we can make here. Instead of app_level_attributes being something anyone can freely add to, we should handle it automatically by looking at DatastoreQuery::Builder#new_query. All the attributes accepted by new_query are ones which can have different values on DatastoreQuery instances and which need merge handling. All the attributes that are not parameters on new_query (and are managed internally by DatastoreQuery::Builder) do not need #merge handling or test coverage.

One way to improve this test based on that insight is to replace its before(:context), after(:context), and before(:example) hooks with these:

before(:context) do @attributes_needing_merge_test_coverage = DatastoreQuery::Builder.instance_method(:new_query).parameters.map(&:last).to_set @attributes_covered = ::Set.new end before(:example) do |ex| Array(ex.metadata[:covers]).each do |attribute| if @attributes_needing_merge_test_coverage.include?(attribute) @attributes_covered << attribute else raise "Attribute `#{attribute}` (from `covers: :#{attribute}`) does not appear to need coverage. Did you misspell it?" end end end after(:context) do untested_attribute = @attributes_needing_merge_test_coverage - @attributes_covered expect(untested_attribute).to be_empty, "`#merge` tests are expected to cover all attributes, " \ "but the following do not appear to have coverage: #{untested_attribute}" end

With that in place, there are two failures:

An error occurred in an `after(:context)` hook. Failure/Error: expect(untested_attribute).to be_empty, "`#merge` tests are expected to cover all attributes, " \ "but the following do not appear to have coverage: #{untested_attribute}" `#merge` tests are expected to cover all attributes, but the following do not appear to have coverage: #<Set: {:search_index_definitions, :index_definitions_by_type_name}> # ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:37:in 'block (2 levels) in <class:GraphQL>' Failures: 1) ElasticGraph::GraphQL::DatastoreQuery#merge does not allow `initial_search_index_definitions` to be overridden Failure/Error: raise "Attribute `#{attribute}` (from `covers: :#{attribute}`) does not appear to need coverage. Did you misspell it?" RuntimeError: Attribute `initial_search_index_definitions` (from `covers: :initial_search_index_definitions`) does not appear to need coverage. Did you misspell it? # ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:30:in 'block (3 levels) in <class:GraphQL>' # ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:26:in 'Array#each' # ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:26:in 'block (2 levels) in <class:GraphQL>' # ./spec_support/lib/elastic_graph/spec_support/logging.rb:124:in 'block (2 levels) in <top (required)>' # ./spec_support/spec_helper.rb:71:in 'block (2 levels) in <top (required)>'

To solve the 2nd one, we should rename the search_index_definitions: parameter on DatastoreBuilder#new_query to initial_search_index_definitions.

To fix the first one, we should remove index_definitions_by_type_name as a new_query parameter and instead make it a parameter passed into DatastoreQuery::Builder.new. I'll post a suggested change on datastore_query.rb showing how to do that.

myronmarston · 2026-06-08T00:52:27Z


        def new_query(
          search_index_definitions:,
+          index_definitions_by_type_name: {},


I don't think we want to allow index_definitions_by_type_name to be provided to new_query, because it's a property of the ElasticGraph::GraphQL instance and not something that should be allowed to vary between different DatastoreQuery instances for the same ElasticGraph::GraphQL application. And defaulting to {} isn't ideal, either--I don't think it's possible to have an EG application with no index definitions!

Here's what I'd do instead:

diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql.rb b/elasticgraph-graphql/lib/elastic_graph/graphql.rb index 3700c250..995aa241 100644 --- a/elasticgraph-graphql/lib/elastic_graph/graphql.rb +++ b/elasticgraph-graphql/lib/elastic_graph/graphql.rb @@ -124,6 +124,7 @@ module ElasticGraph filter_node_interpreter:, runtime_metadata:, logger:, + index_definitions_by_type_name: @datastore_core.index_definitions_by_graphql_type, default_page_size: @config.default_page_size, max_page_size: @config.max_page_size ) diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb b/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb index f790092f..51f351c5 100644 --- a/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb +++ b/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb @@ -366,7 +366,7 @@ module ElasticGraph # Encapsulates dependencies of `Query`, giving us something we can expose off of `application` # to build queries when desired. - class Builder < Support::MemoizableData.define(:runtime_metadata, :logger, :filter_interpreter, :filter_node_interpreter, :default_page_size, :max_page_size) + class Builder < Support::MemoizableData.define(:runtime_metadata, :logger, :filter_interpreter, :filter_node_interpreter, :default_page_size, :max_page_size, :index_definitions_by_type_name) def routing_picker @routing_picker ||= RoutingPicker.new( filter_node_interpreter: filter_node_interpreter, @@ -390,7 +390,6 @@ module ElasticGraph def new_query( search_index_definitions:, - index_definitions_by_type_name: {}, client_filters: [], internal_filters: [], sort: [], diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb b/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb index f4ed628a..da446dd2 100644 --- a/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb +++ b/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb @@ -62,11 +62,9 @@ module ElasticGraph def build_new_query_from(field, args, lookahead, context, monotonic_clock_deadline) unwrapped_type = field.type.unwrap_fully - schema = context.fetch(:elastic_graph_schema) initial_query = @datastore_query_builder.new_query( search_index_definitions: unwrapped_type.search_index_definitions, - index_definitions_by_type_name: schema.index_definitions_by_type_name, monotonic_clock_deadline: monotonic_clock_deadline )

Make it an attribute of DatastoreQuery::Builder and pass it in from ElasticGraph::GraphQL. This is the entire reason DatastoreQuery::Builder exists--to provide an easy, convenient way to "partially apply" the bits of DatastoreQuery state which are really properties of the ElasticGraph::GraphQL instance.

myronmarston · 2026-06-08T00:56:14Z

+        # Returns `nil` if the filters place no constraint on `__typename`, meaning all type names
+        # are potentially matched.
+        def filtered_type_names(filter_hashes, known_type_names)
+          return nil if known_type_names.empty?


I don't think it's possible for known_type_names to be empty so this shouldn't be needed. (I think with your current PR, this only gets hit when unit/integration tests do query_builder.new_query(...) and get the index_definitions_by_type_name: {} default. But I don't think it's possible apart from that and my suggestion to move index_definitions_by_type_name from new_query to a property of the builder should make this impossible).

myronmarston · 2026-06-08T01:12:19Z

        return cluster_name.first if cluster_name.size == 1
        raise Errors::ConfigError, "Found different datastore clusters (#{cluster_name}) to query " \
-          "for query targeting indices: #{search_index_definitions}"
+          "for query targeting indices: #{narrowed_search_index_definitions}"


Similar to my prior feedback, I'm concerned about this potentially causing the recursive traversal of FilterValueSetExtractor to be applied more times than we need since we build up the final query via multiple query.merge_with(...) calls. That means that if this occurs:

query = query_builder.new_query(...) logger.info("Building query which targets cluster: #{query.cluster_name}") updated_query = query_builder.merge_with(...) datastore_search_router.msearch([query])

...then the TypenameFilter logic will have gotten applied multiple times instead of once like we want, since datastore_search_router.msearch uses cluster_name internally.

I believe that basing cluster_name on initial_search_index_definitions will yield the exact same value as basing it on narrowed_search_index_definitions...but the former potentially avoids some amount of unnecessary computation.

Really, I think the only place we need to use narrowed_search_index_definitions is in search_index_expression as that's the method which is used in the OpenSearch query request.

For every other spot where you've used narrowed_search_index_definitions, there's the potential for applying the TypenameFilter logic more times tha we need and I don't see a concrete benefit to using it in the other spots.

If there are some concrete situations you can think of where using narrowed_search_index_definitions in more spots is necessary, please let me know!

myronmarston · 2026-06-08T06:10:09Z

+          typename_set = @extractor.extract_filter_value_set(filter_hashes, ["__typename"])
+          return nil unless typename_set
+
+          known_type_names.select { |name| typename_set.include?(name) }


Suggested change

known_type_names.select { |name| typename_set.include?(name) }

if typename_set.inclusive?

typename_set.values

else

known_type_names - typename_set.values

end

I think this will be a bit more efficient. For the inclusive case, you already have the set of values in typename_set.values. The known_type_names really only come into play for an exclusive set.

myronmarston · 2026-06-08T14:23:35Z

+        #
+        # Returns `nil` if the filters place no constraint on `__typename`, meaning all type names
+        # are potentially matched.
+        def filtered_type_names(filter_hashes, known_type_names)


Can we move known_type_names from a filtered_type_names argument to instead being an initialize argument since it's a property of the schema?

marcdaniels-toast requested review from BrianSigafoos-SQ, ayousufi, bsorbo, jwils, jwondrusch and myronmarston as code owners June 2, 2026 03:49

myronmarston requested changes Jun 2, 2026

View reviewed changes

myronmarston mentioned this pull request Jun 5, 2026

Extract EqualityValueSet for filter value set re-use #1239

Merged

marcdaniels-toast added 2 commits June 5, 2026 16:57

Improve documentation and comments around index narrowing.

223ddf6

Generated with Claude Code

marcdaniels-toast force-pushed the mdaniels/typename-index-narrowing branch from b48fcb7 to 223ddf6 Compare June 6, 2026 03:11

marcdaniels-toast commented Jun 6, 2026

View reviewed changes

myronmarston requested changes Jun 8, 2026

View reviewed changes

myronmarston reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Narrow queried indices based on `_typename` filters#1229

Narrow queried indices based on `_typename` filters#1229
marcdaniels-toast wants to merge 2 commits into
block:mainfrom
marcdaniels-toast:mdaniels/typename-index-narrowing

marcdaniels-toast commented Jun 2, 2026 •

edited

Loading

Uh oh!

myronmarston left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcdaniels-toast Jun 6, 2026 •

edited

Loading

Uh oh!

marcdaniels-toast Jun 6, 2026

Uh oh!

myronmarston Jun 7, 2026

Uh oh!

myronmarston Jun 8, 2026

Uh oh!

myronmarston Jun 8, 2026

Uh oh!

myronmarston Jun 8, 2026

Uh oh!

myronmarston Jun 8, 2026

Uh oh!

myronmarston Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# @!attribute [r] initial_search_index_definitions
		# The index definitions provided at construction time, before any `__typename`-based narrowing.

-          known_type_names.select { |name| typename_set.include?(name) }
+          if typename_set.inclusive?
+            typename_set.values
+          else
+            known_type_names - typename_set.values
+          end

Conversation

marcdaniels-toast commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Doc check

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcdaniels-toast Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcdaniels-toast Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

myronmarston Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcdaniels-toast commented Jun 2, 2026 •

edited

Loading

marcdaniels-toast Jun 6, 2026 •

edited

Loading