feat(gfql): infer typed schemas from graph data#1636
Open
lmeyerov wants to merge 1 commit into
Open
Conversation
dd66786 to
cf59dd3
Compare
This was referenced May 25, 2026
Closed
cf59dd3 to
a2c9c81
Compare
Contributor
Author
|
Ready for manual review on rebased head Validation summary:
PR remains blocked only by required manual review; do not merge until approval. |
a2c9c81 to
617d000
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1338
Summary
graphistry.infer_schema(g),g.infer_schema(), andg.bind(infer_schema=True).GraphSchemainstances from bound local graph data using the existinglabel__*boolean column convention for node labels and relationship types.SchemaInferenceReportfor presence/nullability detail.GraphSchema.metadataso inferred schemas remain self-identifying afterbind(schema=...), Arrow declaration export, and catalog conversion.API Decisions
bind(...)calls do not infer.GraphSchema, not a second schema model or internalGraphSchemaCatalog.return_report=Truereturns(GraphSchema, SchemaInferenceReport)for details not represented directly inGraphSchema.required: observed value on every row for the label/type.optional: observed value on some rows and null on other rows.maybe_absent: dataframe column exists, but no observed value for that label/type.unknown: no rows available for that label/type.infer_schema(..., schema=declared)uses declared node/edge definitions over inferred definitions with the same names.bind(schema=..., infer_schema=True)is rejected instead of silently merging contracts.GraphSchema.metadata["source"] == "inferred".GraphSchema.metadata["source"] == "mixed".Validation
python3 -m pytest -q graphistry/tests/compute/gfql/test_public_schema.py graphistry/tests/compute/gfql/test_schema_inference.py->33 passed, 1 skippedpython3 -m pytest -q docs/test_doc_examples.py -k schema->1 passed, 29 deselectedpython3 -m pytest -q graphistry/tests/compute/gfql -k "schema and not cudf"->84 passed, 17 skipped, 2418 deselectedpython3 -m pytest -q graphistry/tests/compute/gfql -k "not cudf"->2315 passed, 44 skipped, 144 deselected, 15 xfailed./bin/ruff.sh graphistry/schema.py graphistry/schema_inference.py graphistry/PlotterBase.py graphistry/pygraphistry.py graphistry/Plottable.py graphistry/__init__.py graphistry/tests/compute/gfql/test_public_schema.py graphistry/tests/compute/gfql/test_schema_inference.py-> pass./bin/typecheck.sh-> passgit diff --check-> passDGX RAPIDS cuDF smoke:
617d0002:RAPIDS_VERSION=25.02 PROFILE=gfql TEST_FILES="graphistry/tests/compute/gfql/test_schema_inference.py::test_infer_schema_cudf_matches_pandas_representative_case" ./docker/test-rapids-official-local.sh->1 passed, 1 warning(cudf 25.02.02)617d0002:RAPIDS_VERSION=26.02 PROFILE=gfql TEST_FILES="graphistry/tests/compute/gfql/test_schema_inference.py::test_infer_schema_cudf_matches_pandas_representative_case" ./docker/test-rapids-official-local.sh->1 passed(cudf 26.02.01)Full PR CI: green on amended head
617d0002, includingtest-docs,tck-gfql,test-gfql-core3.12/3.14, broad core/full-ai/pandas/polars/graphviz/spark/neo4j tail, Read the Docs, andchanged-line-coverage.Review skill: converged after wave 1 fixed one module-level API guard issue, wave 2 found no blocker/important issues, and wave 3 found no blocker/important metadata/provenance issues on amended head
617d0002.LOC
Compiler-plan surface touched: no. The change returns and binds public schema objects and reuses the existing
GraphSchema.to_catalog()path; it does not modify logical/physical planning, route names, dispatch contracts, IR metadata, verifier/pass extension points, schema/type-system hooks beyond the public #1338 inference surface, remote schema hooks, or source-span/diagnostic fidelity.Design-intent preservation evidence: inferred schemas round-trip through
g.bind(schema=schema).gfql_validate(...), declared schema definitions override inference when explicitly passed, and source/destination topology validation is anchored bytest_bind_infer_schema_is_opt_in_and_round_trips_into_preflight.