Skip to content

Narrow queried indices based on _typename filters#1229

Open
marcdaniels-toast wants to merge 2 commits into
block:mainfrom
marcdaniels-toast:mdaniels/typename-index-narrowing
Open

Narrow queried indices based on _typename filters#1229
marcdaniels-toast wants to merge 2 commits into
block:mainfrom
marcdaniels-toast:mdaniels/typename-index-narrowing

Conversation

@marcdaniels-toast

@marcdaniels-toast marcdaniels-toast commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Closes #1179

When a query on an abstract type filters on __typename, we can skip any index that contains none of the requested concrete types. For example, querying DistributionChannel with _typename: PhysicalStore only needs to hit physical_stores, not distribution_channels.

TypenameIndexPicker implements this using FilterValueSetExtractor for full set-algebra support across not, any_of, and all_of filter combinators.

Update: #1239 extracted EqualityValueSet and created a FilterValueSetExtractor.for_equality factory. Now DatastoreQuery via TypenameFilter determines the subset of types matching the query's filters and narrows to only those types' indices.

One design decision worth noting: DatastoreQuery overrides the generated Data accessor for search_index_definitions so the narrowing is transparent to all callers, using super to get the raw stored value. The alternative, a separate narrowed_search_index_definitions method, would leave callers with two similar methods to choose from.

Update: DatastoreQuery exposes two index definition accessors: initial_search_index_definitions (the set provided at construction time) and narrowed_search_index_definitions. All internal methods that determine what to search use narrowed_search_index_definitions, making the narrowing transparent to callers. initial_search_index_definitions is retained as a named concept for the rare case where the pre-narrowing set is needed.

Doc check

I wanted to be sure the YARD documentation of initial_search_index_definitions worked properly. Here's a screenshot:
image

@myronmarston myronmarston left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @marcdaniels-toast! Left some suggestions.

Comment thread elasticgraph-graphql/lib/elastic_graph/graphql/schema/type.rb Outdated
Comment thread elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb
When a query includes a `__typename` filter with `equal_to_any_of`,
`DatastoreQuery#narrowed_search_index_definitions` now intersects the
full set of initial index definitions with only the indices that can
contain the filtered type names. This avoids searching indices that
cannot possibly contain matching documents.

For example, filtering `distributionChannels` to `PhysicalStore` will
now search only `physical_stores`, not `distribution_channels` as well.

Generated with Claude Code
@marcdaniels-toast marcdaniels-toast force-pushed the mdaniels/typename-index-narrowing branch from b48fcb7 to 223ddf6 Compare June 6, 2026 03:11

module ElasticGraph
class GraphQL
module Filtering

@marcdaniels-toast marcdaniels-toast Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered locating this small class in the DatastoreQuery namespace but thought it might better belong here because unlike RoutingPicker it doesn't directly involve any DatastoreQuery concepts I don't think. And it is a Filter class with "Filter" in the name, so this seems like the right home for it.

Comment on lines +143 to +144
# @!attribute [r] initial_search_index_definitions
# The index definitions provided at construction time, before any `__typename`-based narrowing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR description for a screenshot of this.

default_page_size max_page_size schema_element_names
logger filter_interpreter routing_picker typename_filter index_definitions_by_type_name
index_expression_builder default_page_size max_page_size schema_element_names
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only add an attribute to app_level_attributes if it's handled by DatastoreQuery::Builder to ensure that every DatastoreQuery instance has the same value. But you've implemented index_definitions_by_type_name so that it's an argument to new_query which means that two DatastoreQuery instances can have different values! And in fact, they often do, because it defaults to {} which isn't the correct, accurate value for that attribute.

That made me realize an improvement we can make here. Instead of app_level_attributes being something anyone can freely add to, we should handle it automatically by looking at DatastoreQuery::Builder#new_query. All the attributes accepted by new_query are ones which can have different values on DatastoreQuery instances and which need merge handling. All the attributes that are not parameters on new_query (and are managed internally by DatastoreQuery::Builder) do not need #merge handling or test coverage.

One way to improve this test based on that insight is to replace its before(:context), after(:context), and before(:example) hooks with these:

before(:context) do
  @attributes_needing_merge_test_coverage = DatastoreQuery::Builder.instance_method(:new_query).parameters.map(&:last).to_set
  @attributes_covered = ::Set.new
end

before(:example) do |ex|
  Array(ex.metadata[:covers]).each do |attribute|
    if @attributes_needing_merge_test_coverage.include?(attribute)
      @attributes_covered << attribute
    else
      raise "Attribute `#{attribute}` (from `covers: :#{attribute}`) does not appear to need coverage. Did you misspell it?"
    end
  end
end

after(:context) do
  untested_attribute = @attributes_needing_merge_test_coverage - @attributes_covered
  expect(untested_attribute).to be_empty, "`#merge` tests are expected to cover all attributes, " \
    "but the following do not appear to have coverage: #{untested_attribute}"
end

With that in place, there are two failures:

An error occurred in an `after(:context)` hook.
Failure/Error:
  expect(untested_attribute).to be_empty, "`#merge` tests are expected to cover all attributes, " \
    "but the following do not appear to have coverage: #{untested_attribute}"

  `#merge` tests are expected to cover all attributes, but the following do not appear to have coverage: #<Set: {:search_index_definitions, :index_definitions_by_type_name}>
# ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:37:in 'block (2 levels) in <class:GraphQL>'

Failures:

  1) ElasticGraph::GraphQL::DatastoreQuery#merge does not allow `initial_search_index_definitions` to be overridden
     Failure/Error: raise "Attribute `#{attribute}` (from `covers: :#{attribute}`) does not appear to need coverage. Did you misspell it?"
     
     RuntimeError:
       Attribute `initial_search_index_definitions` (from `covers: :initial_search_index_definitions`) does not appear to need coverage. Did you misspell it?
     # ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:30:in 'block (3 levels) in <class:GraphQL>'
     # ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:26:in 'Array#each'
     # ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:26:in 'block (2 levels) in <class:GraphQL>'
     # ./spec_support/lib/elastic_graph/spec_support/logging.rb:124:in 'block (2 levels) in <top (required)>'
     # ./spec_support/spec_helper.rb:71:in 'block (2 levels) in <top (required)>'
  • To solve the 2nd one, we should rename the search_index_definitions: parameter on DatastoreBuilder#new_query to initial_search_index_definitions.
  • To fix the first one, we should remove index_definitions_by_type_name as a new_query parameter and instead make it a parameter passed into DatastoreQuery::Builder.new. I'll post a suggested change on datastore_query.rb showing how to do that.


def new_query(
search_index_definitions:,
index_definitions_by_type_name: {},

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to allow index_definitions_by_type_name to be provided to new_query, because it's a property of the ElasticGraph::GraphQL instance and not something that should be allowed to vary between different DatastoreQuery instances for the same ElasticGraph::GraphQL application. And defaulting to {} isn't ideal, either--I don't think it's possible to have an EG application with no index definitions!

Here's what I'd do instead:

diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql.rb b/elasticgraph-graphql/lib/elastic_graph/graphql.rb
index 3700c250..995aa241 100644
--- a/elasticgraph-graphql/lib/elastic_graph/graphql.rb
+++ b/elasticgraph-graphql/lib/elastic_graph/graphql.rb
@@ -124,6 +124,7 @@ module ElasticGraph
           filter_node_interpreter:,
           runtime_metadata:,
           logger:,
+          index_definitions_by_type_name: @datastore_core.index_definitions_by_graphql_type,
           default_page_size: @config.default_page_size,
           max_page_size: @config.max_page_size
         )
diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb b/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb
index f790092f..51f351c5 100644
--- a/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb
+++ b/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb
@@ -366,7 +366,7 @@ module ElasticGraph
 
       # Encapsulates dependencies of `Query`, giving us something we can expose off of `application`
       # to build queries when desired.
-      class Builder < Support::MemoizableData.define(:runtime_metadata, :logger, :filter_interpreter, :filter_node_interpreter, :default_page_size, :max_page_size)
+      class Builder < Support::MemoizableData.define(:runtime_metadata, :logger, :filter_interpreter, :filter_node_interpreter, :default_page_size, :max_page_size, :index_definitions_by_type_name)
         def routing_picker
           @routing_picker ||= RoutingPicker.new(
             filter_node_interpreter: filter_node_interpreter,
@@ -390,7 +390,6 @@ module ElasticGraph
 
         def new_query(
           search_index_definitions:,
-          index_definitions_by_type_name: {},
           client_filters: [],
           internal_filters: [],
           sort: [],
diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb b/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb
index f4ed628a..da446dd2 100644
--- a/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb
+++ b/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb
@@ -62,11 +62,9 @@ module ElasticGraph
 
         def build_new_query_from(field, args, lookahead, context, monotonic_clock_deadline)
           unwrapped_type = field.type.unwrap_fully
-          schema = context.fetch(:elastic_graph_schema)
 
           initial_query = @datastore_query_builder.new_query(
             search_index_definitions: unwrapped_type.search_index_definitions,
-            index_definitions_by_type_name: schema.index_definitions_by_type_name,
             monotonic_clock_deadline: monotonic_clock_deadline
           )

Make it an attribute of DatastoreQuery::Builder and pass it in from ElasticGraph::GraphQL. This is the entire reason DatastoreQuery::Builder exists--to provide an easy, convenient way to "partially apply" the bits of DatastoreQuery state which are really properties of the ElasticGraph::GraphQL instance.

# Returns `nil` if the filters place no constraint on `__typename`, meaning all type names
# are potentially matched.
def filtered_type_names(filter_hashes, known_type_names)
return nil if known_type_names.empty?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible for known_type_names to be empty so this shouldn't be needed. (I think with your current PR, this only gets hit when unit/integration tests do query_builder.new_query(...) and get the index_definitions_by_type_name: {} default. But I don't think it's possible apart from that and my suggestion to move index_definitions_by_type_name from new_query to a property of the builder should make this impossible).

return cluster_name.first if cluster_name.size == 1
raise Errors::ConfigError, "Found different datastore clusters (#{cluster_name}) to query " \
"for query targeting indices: #{search_index_definitions}"
"for query targeting indices: #{narrowed_search_index_definitions}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my prior feedback, I'm concerned about this potentially causing the recursive traversal of FilterValueSetExtractor to be applied more times than we need since we build up the final query via multiple query.merge_with(...) calls. That means that if this occurs:

query = query_builder.new_query(...)
logger.info("Building query which targets cluster: #{query.cluster_name}")
updated_query = query_builder.merge_with(...)
datastore_search_router.msearch([query])

...then the TypenameFilter logic will have gotten applied multiple times instead of once like we want, since datastore_search_router.msearch uses cluster_name internally.

I believe that basing cluster_name on initial_search_index_definitions will yield the exact same value as basing it on narrowed_search_index_definitions...but the former potentially avoids some amount of unnecessary computation.

Really, I think the only place we need to use narrowed_search_index_definitions is in search_index_expression as that's the method which is used in the OpenSearch query request.

For every other spot where you've used narrowed_search_index_definitions, there's the potential for applying the TypenameFilter logic more times tha we need and I don't see a concrete benefit to using it in the other spots.

If there are some concrete situations you can think of where using narrowed_search_index_definitions in more spots is necessary, please let me know!

typename_set = @extractor.extract_filter_value_set(filter_hashes, ["__typename"])
return nil unless typename_set

known_type_names.select { |name| typename_set.include?(name) }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
known_type_names.select { |name| typename_set.include?(name) }
if typename_set.inclusive?
typename_set.values
else
known_type_names - typename_set.values
end

I think this will be a bit more efficient. For the inclusive case, you already have the set of values in typename_set.values. The known_type_names really only come into play for an exclusive set.

#
# Returns `nil` if the filters place no constraint on `__typename`, meaning all type names
# are potentially matched.
def filtered_type_names(filter_hashes, known_type_names)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move known_type_names from a filtered_type_names argument to instead being an initialize argument since it's a property of the schema?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize __typename filtering: narrow queried indices

2 participants