Summary
The distribution probe-to-RDF reporting in @lde/pipeline is currently dead code: it generates schema:Action / schema:target / schema:error quads and is unit-tested, but is unreachable from the pipeline and not configurable, so no consumer can turn it on. It should become a first-class, opt-in capability (a pipeline plugin), not be quietly left disconnected.
Current state
packages/pipeline/src/distribution/report.ts exports probeResultsToQuads(), which converts probe results into schema:Action quads (schema:target per probed distribution, schema:error on failures). packages/pipeline/src/distribution/resolveDistributions.ts exports resolveDistributions(), a resolver wrapper that exposes those quads as a quads stream.
Neither is reachable from the pipeline:
Pipeline.processDataset (packages/pipeline/src/pipeline.ts) calls this.distributionResolver.resolve() directly and uses the result only to (a) select a distribution to analyse and (b) feed the progress reporter. It never calls resolveDistributions() or probeResultsToQuads(), so the quads are never handed to a writer and never reach the store.
- A monorepo-wide search confirms the two symbols are referenced only by each other, the
distribution/index.ts barrel export, and their tests — pipeline.ts has zero references.
- The resolver already collects
probeResults for every distribution; that data is currently discarded after reporting.
This surfaced downstream: the Dataset Knowledge Graph stories page had four charts built on the schema:Action model, all of which broke after the QLever migration (fixed separately by re-sourcing from the Dataset Register).
Why opt-in, not removal
It is tempting to delete this, but that reasoning only holds for consumers like NDE’s Dataset Knowledge Graph, where a separate Dataset Register owns distribution validation authoritatively (per-registration schema:status + SHACL validity). For such a consumer, re-emitting schema:Action from the pipeline is a weaker, duplicate signal.
But @lde is published for other consumers too. A standalone consumer that selects datasets from somewhere other than an NDE Dataset Register, and has no separate validation/registry app, has no other record of which distributions were reachable and why others failed — for them the probe report is the primary distribution-provenance signal, not a duplicate.
So the capability should be available but off by default:
- DKG-style consumers (with a register) leave it out — the register stays authoritative, no duplicate signal.
- Standalone consumers add the plugin and get
schema:Action / schema:error distribution provenance in their store.
Proposed change
- Expose the probe report as an opt-in pipeline plugin (alongside
provenancePlugin), rather than calling it unconditionally.
- The existing
PipelinePlugin hook is beforeStageWrite (a transform of the post-stage quad stream). The probe report instead needs to inject quads derived from the resolver’s probeResults, which is not a stream transform — so add a small lifecycle hook (e.g. afterResolve / an additionalQuads source) that runs after distribution resolution and lets a plugin emit probeResultsToQuads(resolved.probeResults, dataset.iri) into the writer. This also makes the probeResults the resolver already collects actually usable.
- Keep
SparqlDistributionResolver / ImportResolver unchanged (still used for distribution selection).
- Document the plugin so consumers can decide whether they need it.
Alternative (YAGNI)
If the maintainers prefer to keep the library lean and no consumer needs this today, remove probeResultsToQuads, resolveDistributions, their barrel exports, and tests, and re-add when a real consumer appears (git history preserves it). Given the code already exists, is tested, and there is a clean plugin seam, the opt-in route is low-cost and better serves the shared-library goal — but either beats the current unreachable-but-tested state.
Caveat to confirm
Verify nothing outside this monorepo still consumes the schema:Action distribution triples. They have not been written since the QLever migration, so any remaining consumer is already broken — but worth a quick check before changing the public surface.
Summary
The distribution probe-to-RDF reporting in
@lde/pipelineis currently dead code: it generatesschema:Action/schema:target/schema:errorquads and is unit-tested, but is unreachable from the pipeline and not configurable, so no consumer can turn it on. It should become a first-class, opt-in capability (a pipeline plugin), not be quietly left disconnected.Current state
packages/pipeline/src/distribution/report.tsexportsprobeResultsToQuads(), which converts probe results intoschema:Actionquads (schema:targetper probed distribution,schema:erroron failures).packages/pipeline/src/distribution/resolveDistributions.tsexportsresolveDistributions(), a resolver wrapper that exposes those quads as aquadsstream.Neither is reachable from the pipeline:
Pipeline.processDataset(packages/pipeline/src/pipeline.ts) callsthis.distributionResolver.resolve()directly and uses the result only to (a) select a distribution to analyse and (b) feed the progress reporter. It never callsresolveDistributions()orprobeResultsToQuads(), so the quads are never handed to a writer and never reach the store.distribution/index.tsbarrel export, and their tests —pipeline.tshas zero references.probeResultsfor every distribution; that data is currently discarded after reporting.This surfaced downstream: the Dataset Knowledge Graph stories page had four charts built on the
schema:Actionmodel, all of which broke after the QLever migration (fixed separately by re-sourcing from the Dataset Register).Why opt-in, not removal
It is tempting to delete this, but that reasoning only holds for consumers like NDE’s Dataset Knowledge Graph, where a separate Dataset Register owns distribution validation authoritatively (per-registration
schema:status+ SHACL validity). For such a consumer, re-emittingschema:Actionfrom the pipeline is a weaker, duplicate signal.But
@ldeis published for other consumers too. A standalone consumer that selects datasets from somewhere other than an NDE Dataset Register, and has no separate validation/registry app, has no other record of which distributions were reachable and why others failed — for them the probe report is the primary distribution-provenance signal, not a duplicate.So the capability should be available but off by default:
schema:Action/schema:errordistribution provenance in their store.Proposed change
provenancePlugin), rather than calling it unconditionally.PipelinePluginhook isbeforeStageWrite(a transform of the post-stage quad stream). The probe report instead needs to inject quads derived from the resolver’sprobeResults, which is not a stream transform — so add a small lifecycle hook (e.g.afterResolve/ anadditionalQuadssource) that runs after distribution resolution and lets a plugin emitprobeResultsToQuads(resolved.probeResults, dataset.iri)into the writer. This also makes theprobeResultsthe resolver already collects actually usable.SparqlDistributionResolver/ImportResolverunchanged (still used for distribution selection).Alternative (YAGNI)
If the maintainers prefer to keep the library lean and no consumer needs this today, remove
probeResultsToQuads,resolveDistributions, their barrel exports, and tests, and re-add when a real consumer appears (git history preserves it). Given the code already exists, is tested, and there is a clean plugin seam, the opt-in route is low-cost and better serves the shared-library goal — but either beats the current unreachable-but-tested state.Caveat to confirm
Verify nothing outside this monorepo still consumes the
schema:Actiondistribution triples. They have not been written since the QLever migration, so any remaining consumer is already broken — but worth a quick check before changing the public surface.