Narrow queried indices based on _typename filters#1229
Narrow queried indices based on _typename filters#1229marcdaniels-toast wants to merge 2 commits into
_typename filters#1229Conversation
myronmarston
left a comment
There was a problem hiding this comment.
Nice work @marcdaniels-toast! Left some suggestions.
When a query includes a `__typename` filter with `equal_to_any_of`, `DatastoreQuery#narrowed_search_index_definitions` now intersects the full set of initial index definitions with only the indices that can contain the filtered type names. This avoids searching indices that cannot possibly contain matching documents. For example, filtering `distributionChannels` to `PhysicalStore` will now search only `physical_stores`, not `distribution_channels` as well. Generated with Claude Code
Generated with Claude Code
b48fcb7 to
223ddf6
Compare
|
|
||
| module ElasticGraph | ||
| class GraphQL | ||
| module Filtering |
There was a problem hiding this comment.
I considered locating this small class in the DatastoreQuery namespace but thought it might better belong here because unlike RoutingPicker it doesn't directly involve any DatastoreQuery concepts I don't think. And it is a Filter class with "Filter" in the name, so this seems like the right home for it.
| # @!attribute [r] initial_search_index_definitions | ||
| # The index definitions provided at construction time, before any `__typename`-based narrowing. |
There was a problem hiding this comment.
See PR description for a screenshot of this.
| default_page_size max_page_size schema_element_names | ||
| logger filter_interpreter routing_picker typename_filter index_definitions_by_type_name | ||
| index_expression_builder default_page_size max_page_size schema_element_names | ||
| ] |
There was a problem hiding this comment.
We should only add an attribute to app_level_attributes if it's handled by DatastoreQuery::Builder to ensure that every DatastoreQuery instance has the same value. But you've implemented index_definitions_by_type_name so that it's an argument to new_query which means that two DatastoreQuery instances can have different values! And in fact, they often do, because it defaults to {} which isn't the correct, accurate value for that attribute.
That made me realize an improvement we can make here. Instead of app_level_attributes being something anyone can freely add to, we should handle it automatically by looking at DatastoreQuery::Builder#new_query. All the attributes accepted by new_query are ones which can have different values on DatastoreQuery instances and which need merge handling. All the attributes that are not parameters on new_query (and are managed internally by DatastoreQuery::Builder) do not need #merge handling or test coverage.
One way to improve this test based on that insight is to replace its before(:context), after(:context), and before(:example) hooks with these:
before(:context) do
@attributes_needing_merge_test_coverage = DatastoreQuery::Builder.instance_method(:new_query).parameters.map(&:last).to_set
@attributes_covered = ::Set.new
end
before(:example) do |ex|
Array(ex.metadata[:covers]).each do |attribute|
if @attributes_needing_merge_test_coverage.include?(attribute)
@attributes_covered << attribute
else
raise "Attribute `#{attribute}` (from `covers: :#{attribute}`) does not appear to need coverage. Did you misspell it?"
end
end
end
after(:context) do
untested_attribute = @attributes_needing_merge_test_coverage - @attributes_covered
expect(untested_attribute).to be_empty, "`#merge` tests are expected to cover all attributes, " \
"but the following do not appear to have coverage: #{untested_attribute}"
endWith that in place, there are two failures:
An error occurred in an `after(:context)` hook.
Failure/Error:
expect(untested_attribute).to be_empty, "`#merge` tests are expected to cover all attributes, " \
"but the following do not appear to have coverage: #{untested_attribute}"
`#merge` tests are expected to cover all attributes, but the following do not appear to have coverage: #<Set: {:search_index_definitions, :index_definitions_by_type_name}>
# ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:37:in 'block (2 levels) in <class:GraphQL>'
Failures:
1) ElasticGraph::GraphQL::DatastoreQuery#merge does not allow `initial_search_index_definitions` to be overridden
Failure/Error: raise "Attribute `#{attribute}` (from `covers: :#{attribute}`) does not appear to need coverage. Did you misspell it?"
RuntimeError:
Attribute `initial_search_index_definitions` (from `covers: :initial_search_index_definitions`) does not appear to need coverage. Did you misspell it?
# ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:30:in 'block (3 levels) in <class:GraphQL>'
# ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:26:in 'Array#each'
# ./elasticgraph-graphql/spec/unit/elastic_graph/graphql/datastore_query/merge_spec.rb:26:in 'block (2 levels) in <class:GraphQL>'
# ./spec_support/lib/elastic_graph/spec_support/logging.rb:124:in 'block (2 levels) in <top (required)>'
# ./spec_support/spec_helper.rb:71:in 'block (2 levels) in <top (required)>'
- To solve the 2nd one, we should rename the
search_index_definitions:parameter onDatastoreBuilder#new_querytoinitial_search_index_definitions. - To fix the first one, we should remove
index_definitions_by_type_nameas anew_queryparameter and instead make it a parameter passed intoDatastoreQuery::Builder.new. I'll post a suggested change ondatastore_query.rbshowing how to do that.
|
|
||
| def new_query( | ||
| search_index_definitions:, | ||
| index_definitions_by_type_name: {}, |
There was a problem hiding this comment.
I don't think we want to allow index_definitions_by_type_name to be provided to new_query, because it's a property of the ElasticGraph::GraphQL instance and not something that should be allowed to vary between different DatastoreQuery instances for the same ElasticGraph::GraphQL application. And defaulting to {} isn't ideal, either--I don't think it's possible to have an EG application with no index definitions!
Here's what I'd do instead:
diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql.rb b/elasticgraph-graphql/lib/elastic_graph/graphql.rb
index 3700c250..995aa241 100644
--- a/elasticgraph-graphql/lib/elastic_graph/graphql.rb
+++ b/elasticgraph-graphql/lib/elastic_graph/graphql.rb
@@ -124,6 +124,7 @@ module ElasticGraph
filter_node_interpreter:,
runtime_metadata:,
logger:,
+ index_definitions_by_type_name: @datastore_core.index_definitions_by_graphql_type,
default_page_size: @config.default_page_size,
max_page_size: @config.max_page_size
)
diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb b/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb
index f790092f..51f351c5 100644
--- a/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb
+++ b/elasticgraph-graphql/lib/elastic_graph/graphql/datastore_query.rb
@@ -366,7 +366,7 @@ module ElasticGraph
# Encapsulates dependencies of `Query`, giving us something we can expose off of `application`
# to build queries when desired.
- class Builder < Support::MemoizableData.define(:runtime_metadata, :logger, :filter_interpreter, :filter_node_interpreter, :default_page_size, :max_page_size)
+ class Builder < Support::MemoizableData.define(:runtime_metadata, :logger, :filter_interpreter, :filter_node_interpreter, :default_page_size, :max_page_size, :index_definitions_by_type_name)
def routing_picker
@routing_picker ||= RoutingPicker.new(
filter_node_interpreter: filter_node_interpreter,
@@ -390,7 +390,6 @@ module ElasticGraph
def new_query(
search_index_definitions:,
- index_definitions_by_type_name: {},
client_filters: [],
internal_filters: [],
sort: [],
diff --git a/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb b/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb
index f4ed628a..da446dd2 100644
--- a/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb
+++ b/elasticgraph-graphql/lib/elastic_graph/graphql/resolvers/query_adapter.rb
@@ -62,11 +62,9 @@ module ElasticGraph
def build_new_query_from(field, args, lookahead, context, monotonic_clock_deadline)
unwrapped_type = field.type.unwrap_fully
- schema = context.fetch(:elastic_graph_schema)
initial_query = @datastore_query_builder.new_query(
search_index_definitions: unwrapped_type.search_index_definitions,
- index_definitions_by_type_name: schema.index_definitions_by_type_name,
monotonic_clock_deadline: monotonic_clock_deadline
)Make it an attribute of DatastoreQuery::Builder and pass it in from ElasticGraph::GraphQL. This is the entire reason DatastoreQuery::Builder exists--to provide an easy, convenient way to "partially apply" the bits of DatastoreQuery state which are really properties of the ElasticGraph::GraphQL instance.
| # Returns `nil` if the filters place no constraint on `__typename`, meaning all type names | ||
| # are potentially matched. | ||
| def filtered_type_names(filter_hashes, known_type_names) | ||
| return nil if known_type_names.empty? |
There was a problem hiding this comment.
I don't think it's possible for known_type_names to be empty so this shouldn't be needed. (I think with your current PR, this only gets hit when unit/integration tests do query_builder.new_query(...) and get the index_definitions_by_type_name: {} default. But I don't think it's possible apart from that and my suggestion to move index_definitions_by_type_name from new_query to a property of the builder should make this impossible).
| return cluster_name.first if cluster_name.size == 1 | ||
| raise Errors::ConfigError, "Found different datastore clusters (#{cluster_name}) to query " \ | ||
| "for query targeting indices: #{search_index_definitions}" | ||
| "for query targeting indices: #{narrowed_search_index_definitions}" |
There was a problem hiding this comment.
Similar to my prior feedback, I'm concerned about this potentially causing the recursive traversal of FilterValueSetExtractor to be applied more times than we need since we build up the final query via multiple query.merge_with(...) calls. That means that if this occurs:
query = query_builder.new_query(...)
logger.info("Building query which targets cluster: #{query.cluster_name}")
updated_query = query_builder.merge_with(...)
datastore_search_router.msearch([query])...then the TypenameFilter logic will have gotten applied multiple times instead of once like we want, since datastore_search_router.msearch uses cluster_name internally.
I believe that basing cluster_name on initial_search_index_definitions will yield the exact same value as basing it on narrowed_search_index_definitions...but the former potentially avoids some amount of unnecessary computation.
Really, I think the only place we need to use narrowed_search_index_definitions is in search_index_expression as that's the method which is used in the OpenSearch query request.
For every other spot where you've used narrowed_search_index_definitions, there's the potential for applying the TypenameFilter logic more times tha we need and I don't see a concrete benefit to using it in the other spots.
If there are some concrete situations you can think of where using narrowed_search_index_definitions in more spots is necessary, please let me know!
| typename_set = @extractor.extract_filter_value_set(filter_hashes, ["__typename"]) | ||
| return nil unless typename_set | ||
|
|
||
| known_type_names.select { |name| typename_set.include?(name) } |
There was a problem hiding this comment.
| known_type_names.select { |name| typename_set.include?(name) } | |
| if typename_set.inclusive? | |
| typename_set.values | |
| else | |
| known_type_names - typename_set.values | |
| end |
I think this will be a bit more efficient. For the inclusive case, you already have the set of values in typename_set.values. The known_type_names really only come into play for an exclusive set.
| # | ||
| # Returns `nil` if the filters place no constraint on `__typename`, meaning all type names | ||
| # are potentially matched. | ||
| def filtered_type_names(filter_hashes, known_type_names) |
There was a problem hiding this comment.
Can we move known_type_names from a filtered_type_names argument to instead being an initialize argument since it's a property of the schema?
Closes #1179
When a query on an abstract type filters on
__typename, we can skip any index that contains none of the requested concrete types. For example, queryingDistributionChannelwith_typename: PhysicalStoreonly needs to hitphysical_stores, notdistribution_channels.TypenameIndexPickerimplements this usingFilterValueSetExtractorfor full set-algebra support acrossnot,any_of, andall_offilter combinators.Update: #1239 extracted
EqualityValueSetand created aFilterValueSetExtractor.for_equalityfactory. NowDatastoreQueryviaTypenameFilterdetermines the subset of types matching the query's filters and narrows to only those types' indices.One design decision worth noting:DatastoreQueryoverrides the generatedDataaccessor forsearch_index_definitionsso the narrowing is transparent to all callers, usingsuperto get the raw stored value. The alternative, a separatenarrowed_search_index_definitionsmethod, would leave callers with two similar methods to choose from.Update:
DatastoreQueryexposes two index definition accessors:initial_search_index_definitions(the set provided at construction time) andnarrowed_search_index_definitions. All internal methods that determine what to search usenarrowed_search_index_definitions, making the narrowing transparent to callers.initial_search_index_definitionsis retained as a named concept for the rare case where the pre-narrowing set is needed.Doc check
I wanted to be sure the YARD documentation of

initial_search_index_definitionsworked properly. Here's a screenshot: