Skip to content

Migrate JSON ingestion specs into gem#1224

Open
jwils wants to merge 7 commits into
joshuaw/json-ingestion-docsfrom
joshuaw/json-ingestion-test-migration
Open

Migrate JSON ingestion specs into gem#1224
jwils wants to merge 7 commits into
joshuaw/json-ingestion-docsfrom
joshuaw/json-ingestion-test-migration

Conversation

@jwils

@jwils jwils commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Why

The JSON ingestion gem should own the JSON-schema tests that cover its extracted implementation so the gem can enforce full coverage without a SimpleCov skip.

What

  • Move JSON schema generation, metadata merge, pruner, and matcher specs into elasticgraph-json_ingestion.
  • Add JSON-ingestion artifact-manager integration coverage for versioned schema artifacts, version-bump enforcement, metadata refresh, and schema evolution diagnostics.
  • Add focused wrapper/scalar specs for the remaining extracted JSON-ingestion paths.
  • Remove the temporary SimpleCov filter for elasticgraph-json_ingestion.

Verification

  • script/run_gem_specs elasticgraph-json_ingestion
  • script/type_check elasticgraph-json_ingestion
  • BUNDLE_GEMFILE=Gemfile bundle exec rspec --format progress spec/integration/elastic_graph/schema_definition/rake_tasks_spec.rb spec/unit/elastic_graph/schema_definition/runtime_metadata/scalar_types_by_name_spec.rb from elasticgraph-schema_definition/
  • bundle exec standardrb ... on touched specs/support

script/run_gem_specs elasticgraph-schema_definition and script/run_gem_specs elasticgraph-json_ingestion both pass with 100% coverage (with the test datastore booted).

Follow-ups

  • The migrated spec files keep the compact module JSONIngestion::SchemaDefinition form so that git/GitHub detects them as file moves. A follow-up PR (after this stack merges) will convert them to the standard nested module style and re-indent.
  • The JSON-schema-versioning scenarios that the new schema_artifact_manager_extension_spec.rb covers have been removed from the core rake_tasks_spec.rb; the core lines they exercised are now covered by focused unit specs instead.

Stack

Current PR is marked with ->.

@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 9545bf4 to eee835d Compare May 31, 2026 21:14
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from cd78e96 to c34ed76 Compare May 31, 2026 21:45
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from eee835d to c0a2045 Compare May 31, 2026 21:45
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from c34ed76 to d4259d8 Compare May 31, 2026 22:08
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from c0a2045 to c1771af Compare May 31, 2026 22:08
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from c1771af to d6dba0f Compare June 1, 2026 18:28
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from d4259d8 to 5d3b496 Compare June 1, 2026 18:28
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from d6dba0f to ff22ea0 Compare June 1, 2026 18:42
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch 2 times, most recently from 8c038dc to 7371130 Compare June 1, 2026 18:58
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch 2 times, most recently from ef120ae to f3a3547 Compare June 1, 2026 19:01
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from 7371130 to 83edb59 Compare June 4, 2026 13:59
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from f3a3547 to 12be6cd Compare June 4, 2026 13:59
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from 83edb59 to 7fb3c05 Compare June 4, 2026 14:18
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 12be6cd to 2e2996c Compare June 4, 2026 14:19
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from 7fb3c05 to e713ed5 Compare June 5, 2026 18:36
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch 2 times, most recently from 50e8b8c to e243aa7 Compare June 5, 2026 18:45
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch 2 times, most recently from ca4ad1d to ce59542 Compare June 5, 2026 19:01
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from 23e1562 to c95e7cb Compare June 9, 2026 15:10
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 460d40d to 56e06cf Compare June 9, 2026 15:10
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from c95e7cb to a92fd04 Compare June 9, 2026 15:30
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 56e06cf to 9149b33 Compare June 9, 2026 15:31
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from a92fd04 to 6b2f76b Compare June 9, 2026 18:40
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 9149b33 to 16ae034 Compare June 9, 2026 18:51
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from 6b2f76b to aa68d56 Compare June 9, 2026 19:07
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 16ae034 to 253bda5 Compare June 9, 2026 19:07
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from aa68d56 to 95d92e6 Compare June 9, 2026 19:52
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 253bda5 to 898d197 Compare June 9, 2026 19:52
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from 95d92e6 to 18b4bc2 Compare June 9, 2026 20:09
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 898d197 to 9c881ac Compare June 9, 2026 20:09
jwils added 3 commits June 10, 2026 07:46
- Trim the JSON-schema-versioning scenarios from the core `rake_tasks_spec`
  that are now covered by `schema_artifact_manager_extension_spec` in
  `elasticgraph-json_ingestion`, and remove the helpers that only those
  scenarios used.
- Replace the core coverage those scenarios provided with focused unit specs:
  the `deleted_type`/`deleted_field`/`renamed_from` registration DSL,
  `FieldPath#fully_qualified_path_in_index`, and the test-support behavior
  when a schema sets its own JSON schema version.
- Make `doc_comment` load-bearing in the wrapper equality specs by adding
  cases that differ only in `doc_comment`.
- Assert on observable behavior instead of `singleton_class.ancestors` in
  the blockless-element extension spec.
- Explain why `unresolved_type_ref` uses a stand-in.
@jwils jwils force-pushed the joshuaw/json-ingestion-docs branch from 18b4bc2 to e99a7bf Compare June 10, 2026 12:48
@jwils jwils force-pushed the joshuaw/json-ingestion-test-migration branch from 9c881ac to 81f85dc Compare June 10, 2026 12:48
myronmarston

This comment was marked as resolved.

@jwils

jwils commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed in 3dd2511 on this PR.

jwils added 3 commits June 11, 2026 10:53
The extension loader raises if the same extension constant is loaded from two
different `require_path`s within one process. `scalar_type_extension_spec`
loaded `ExampleScalarCoercionAdapter` via an absolute path while
`elasticgraph-schema_definition`'s `scalar_types_by_name_spec` used a
relative one, so any spec worker that ran both files failed 17 examples with
`InvalidExtensionError`. Use the same relative path (with a local copy of the
example adapter so the gem's own suite can resolve it).
As noted on #1224, mentions of JSON schema in
`elasticgraph-schema_definition/spec` should approach zero now that the JSON
schema logic lives in `elasticgraph-json_ingestion`:

- The graphql_schema, datastore_config, and runtime_metadata spec supports
  now run schemas without any extension modules, and the `json_schema` calls
  that existed only to satisfy the extension's scalar validation are gone.
- The scalar `json_schema` requirement test (duplicated by the json_ingestion
  suite) is deleted, and the `long`/`unsigned_long` placeholder-inference
  tests that depend on JSON schema bounds moved to the json_ingestion suite
  (along with the built-in-scalar placeholder map, which differs with the
  extension loaded).
- The reserved-type-name test now exercises the core `reserved_type_names`
  mechanism directly; the `ElasticGraphEventEnvelope` reservation is already
  covered by the json_ingestion suite.
- `rake_tasks_spec` runs its synthetic schemas without the extension and no
  longer asserts on JSON schema artifacts (covered by json_ingestion's
  integration spec). A new short-diff test keeps `truncate_diff` fully
  covered. The tests that evaluate the repo's own `config/schema.rb` still
  load the extension, since that schema is a JSON ingestion application.

The remaining mentions are the `define_schema` test-support seam (which
exists for optional ingestion extensions) and JSON-the-format documentation
text.
The whole-repo CI run failed coverage because two identical copies of
`ExampleScalarCoercionAdapter` (one per gem suite) shared the same
relative require path, so only one of them ever loaded and the other
reported 0% coverage. Moving the adapter into `spec_support/lib` leaves
a single copy that every suite loads from the same require path, which
also preserves the extension loader's same-path guarantee that the
prior commit relied on.

@myronmarston myronmarston left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done reviewing but want to submit my feedback so far...

Comment thread elasticgraph-json_ingestion/spec/spec_helper.rb
super(
schema_element_name_form: "snake_case",
extension_modules: [JSONIngestion::SchemaDefinition::APIExtension],
extension_modules: [APIExtension],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should implement something like

extension_modules: [SchemaDefinition::APIExtension],
so that the APIExtension is automatically added by all tests in the json-ingestion gem.

(Please defer this to a follow up PR).

- Port the JSON-schema tests from `rake_tasks_spec` (version bump
  enforcement, versioned metadata maintenance, renamed/deleted
  field/type guidance, conflict and routing/rollover deletion errors)
  into `schema_artifact_manager_extension_spec` in place of the minimal
  from-scratch tests, restoring the original coverage.
- Rewrite `wrappers_spec` to drive the wrappers through the public
  schema definition API instead of exercising the internal classes
  directly.
- Fix the value semantics of the stateless field type wrappers
  (`Scalar`, `Enum`, `Union`): `DelegateClass` defines `==` to unwrap
  only the left operand, so two wrappers around equal field types never
  compared equal even though their `hash` values matched, breaking the
  `eql?`/`hash` contract. A shared `ValueSemantics` module now unwraps
  both sides, keeping `==`/`eql?`/`hash` consistent.

@myronmarston myronmarston left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the feedback below:

  • Can we add elasticgraph-json_ingestion to the list of gems that run in parallel in
    if [[ "$gem" == "elasticgraph-graphql" || "$gem" == "elasticgraph-indexer" || "$gem" == "elasticgraph-schema_definition" ]]; then
    ? I measured rspec vs flatware_rspec and found that flatware makes the json-ingestion test suite quite a bit faster.
  • I audited the remaining mentions of json in elasticgraph-schema_definition and here's what I found...

# Allow any JSON for this type. The list of supported types is taken from:
#
# https://github.com/json-schema-org/json-schema-spec/blob/draft-07/schema.json#L23-L29
#
# ...except we are omitting `null` here; it'll be added by the nullability decorator if the field is defined as nullable.
# In the index we store this as a JSON string in a `keyword` field.
should be moved to
"Untyped" => {type: ["array", "boolean", "integer", "number", "object", "string"].freeze},

Some RBS signatures remain to migrate (or just remove if already copied over):

sig/elastic_graph/schema_definition/schema_elements/field.rbs:12:        type jsonSchema = untyped
sig/elastic_graph/schema_definition/schema_elements/type_with_subfields.rbs:58:          # ?json_schema: SchemaElements::Field::jsonSchema,
sig/elastic_graph/schema_definition/test_support.rbs:8:        ?json_schema_version: ::Integer,
sig/elastic_graph/schema_definition/test_support.rbs:20:        ?json_schema_version: ::Integer,
sig/elastic_graph/schema_definition/api.rbs:5:    type jsonSchemaLayer = :nullable | :array
sig/elastic_graph/schema_definition/api.rbs:6:    type jsonSchemaLayersArray = ::Array[jsonSchemaLayer]

(Note: these latter 2 are not really related to the PR as they aren't tests, but now that the tests have move this is the first point an audit was possible. Feel free to move those in a follow up PR).

Beyond that, this LGTM!

RSpec.describe DeprecatedElement do
include_context "SchemaDefinitionHelpers"

it "records `deleted_type`, `deleted_field`, and `renamed_from` calls so that schema artifact tooling can consume them" do

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted_type, deleted_field, and renamed_from are really only needed for use with the JSON schema definition APIs. They are used to record additional metadata on the versioned JSON schema artifacts. Since they don't have any use beyond that, it probably makes sense to move them into elasticgraph-json_ingestion in a follow up PR.

}.to raise_error Errors::SchemaError, a_string_including("BigInt", "lacks `json_schema`")
end

it "extends schema elements created without customization blocks" do

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem related to ScalarTypeExtension at all. It seems to exist to cover some branches in factory_extension.rb. I'd recommend we move it there and simplify it drastically:

diff --git a/elasticgraph-json_ingestion/lib/elastic_graph/json_ingestion/schema_definition/factory_extension.rb b/elasticgraph-json_ingestion/lib/elastic_graph/json_ingestion/schema_definition/factory_extension.rb
index 93f2ae09..2bff3962 100644
--- a/elasticgraph-json_ingestion/lib/elastic_graph/json_ingestion/schema_definition/factory_extension.rb
+++ b/elasticgraph-json_ingestion/lib/elastic_graph/json_ingestion/schema_definition/factory_extension.rb
@@ -90,7 +90,7 @@ module ElasticGraph
               extended_type.json_schema(**options)
             end
 
-            yield extended_type if block_given?
+            yield extended_type
             extended_type.finalize_json_schema_configuration!
           end
         end
@@ -104,7 +104,7 @@ module ElasticGraph
         def new_type_with_subfields(schema_kind, name, wrapping_type:, field_factory:)
           super(schema_kind, name, wrapping_type: wrapping_type, field_factory: field_factory) do |type|
             extended_type = type.extend(SchemaElements::TypeWithSubfieldsExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::TypeWithSubfields & SchemaElements::TypeWithSubfieldsExtension
-            yield extended_type if block_given?
+            yield extended_type
           end
         end
 
diff --git a/elasticgraph-json_ingestion/spec/unit/elastic_graph/json_ingestion/schema_definition/factory_extension_spec.rb b/elasticgraph-json_ingestion/spec/unit/elastic_graph/json_ingestion/schema_definition/factory_extension_spec.rb
new file mode 100644
index 00000000..7080b068
--- /dev/null
+++ b/elasticgraph-json_ingestion/spec/unit/elastic_graph/json_ingestion/schema_definition/factory_extension_spec.rb
@@ -0,0 +1,29 @@
+# Copyright 2024 - 2026 Block, Inc.
+#
+# Use of this source code is governed by an MIT-style
+# license that can be found in the LICENSE file or at
+# https://opensource.org/licenses/MIT.
+#
+# frozen_string_literal: true
+
+require "elastic_graph/constants"
+require "elastic_graph/errors"
+require "elastic_graph/json_ingestion/schema_definition/factory_extension"
+require "elastic_graph/spec_support/schema_definition_helpers"
+
+module ElasticGraph
+  module JSONIngestion
+    module SchemaDefinition
+      RSpec.describe FactoryExtension do
+        include_context "SchemaDefinitionHelpers"
+
+        it "allows `enum_type` and `interface_type` to be called without a block" do
+          define_schema(schema_element_name_form: "snake_case") do |schema|
+            schema.enum_type "EmptyEnum"
+            schema.interface_type "EmptyInterface"
+          end
+        end
+      end
+    end
+  end
+end
diff --git a/elasticgraph-json_ingestion/spec/unit/elastic_graph/json_ingestion/schema_definition/schema_elements/scalar_type_extension_spec.rb b/elasticgraph-json_ingestion/spec/unit/elastic_graph/json_ingestion/schema_definition/schema_elements/scalar_type_extension_spec.rb
index 364dfe2e..15199586 100644
--- a/elasticgraph-json_ingestion/spec/unit/elastic_graph/json_ingestion/schema_definition/schema_elements/scalar_type_extension_spec.rb
+++ b/elasticgraph-json_ingestion/spec/unit/elastic_graph/json_ingestion/schema_definition/schema_elements/scalar_type_extension_spec.rb
@@ -28,37 +28,6 @@ module ElasticGraph
             }.to raise_error Errors::SchemaError, a_string_including("BigInt", "lacks `json_schema`")
           end
 
-          it "extends schema elements created without customization blocks" do
-            api = build_api
-            api.enum_type "EmptyEnum"
-            api.interface_type "EmptyInterface"
-            direct_type_with_subfields = api.factory.new_type_with_subfields(
-              :object,
-              "DirectObject",
-              wrapping_type: nil,
-              field_factory: api.factory.method(:new_field)
-            )
-
-            # An enum's derived GraphQL types are built from a derived scalar twin, which can only be
-            # built if `EnumTypeExtension` configured the twin's `json_schema`; otherwise building it
-            # raises a "lacks `json_schema`" error.
-            expect {
-              api.state.enum_types_by_name.fetch("EmptyEnum").derived_graphql_types
-            }.not_to raise_error
-
-            # `json_schema` is only available on types extended with `TypeWithSubfieldsExtension`.
-            interface_type = api.state.object_types_by_name.fetch("EmptyInterface")
-            interface_type.json_schema minProperties: 1
-            expect(interface_type.json_schema_options).to eq({minProperties: 1})
-
-            direct_type_with_subfields.json_schema minProperties: 2
-            expect(direct_type_with_subfields.json_schema_options).to eq({minProperties: 2})
-
-            expect {
-              build_api.scalar_type "BigInt"
-            }.to raise_error Errors::SchemaError, a_string_including("BigInt", "lacks `json_schema`")
-          end
-
           it "infers a numeric missing-value placeholder for JSON-safe unsigned_long scalars with custom coercion" do
             grouping_missing_value_placeholder = grouping_missing_value_placeholder_for(
               "unsigned_long",
@@ -215,16 +184,6 @@ module ElasticGraph
             # when one worker runs multiple suites.
             "elastic_graph/spec_support/example_extensions/scalar_coercion_adapter"
           end
-
-          def build_api
-            schema_elements = SchemaArtifacts::RuntimeMetadata::SchemaElementNames.new(form: "snake_case")
-            ::ElasticGraph::SchemaDefinition::API.new(
-              schema_elements,
-              true,
-              extension_modules: [APIExtension],
-              output: log_device
-            )
-          end
         end
       end
     end
diff --git a/elasticgraph-schema_definition/sig/elastic_graph/schema_definition/factory.rbs b/elasticgraph-schema_definition/sig/elastic_graph/schema_definition/factory.rbs
index 0a5c325b..cc55a8a9 100644
--- a/elasticgraph-schema_definition/sig/elastic_graph/schema_definition/factory.rbs
+++ b/elasticgraph-schema_definition/sig/elastic_graph/schema_definition/factory.rbs
@@ -112,7 +112,7 @@ module ElasticGraph
         ::String,
         wrapping_type: SchemaElements::anyObjectType,
         field_factory: ::Method
-      ) ?{ (SchemaElements::TypeWithSubfields) -> void } -> SchemaElements::TypeWithSubfields
+      ) { (SchemaElements::TypeWithSubfields) -> void } -> SchemaElements::TypeWithSubfields
       @@type_with_subfields_new: ::Method
 
       def new_union_type: (::String) { (SchemaElements::UnionType) -> void } -> SchemaElements::UnionType

Note that factory.new_type_with_subfields and factory.new_scalar_type both yield unconditionally w/o the json-ingestion extension:

def new_scalar_type(name)
@@scalar_type_new.call(@state, name.to_s) do |scalar_type|
yield scalar_type
end
end

def new_type_with_subfields(schema_kind, name, wrapping_type:, field_factory:)
@@type_with_subfields_new.call(schema_kind, @state, name, wrapping_type: wrapping_type, field_factory: field_factory) do |type_with_subfields|
yield type_with_subfields
end
end

...so there's no need for the json-ingestion extension to support no block for these. I also noticed the type signature was wrong in factory.rbs so my suggested snippet above fixes that, too.

}.to raise_error Errors::SchemaError, a_string_including("BigInt", "lacks `json_schema`")
end

it "infers a numeric missing-value placeholder for JSON-safe unsigned_long scalars with custom coercion" do

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could help to wrap all the ones related to grouping_missing_value_placeholder in a describe for organizational purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants