Skip to content

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625

Open
nabinchha wants to merge 3 commits into
mainfrom
nmulepati/fix-590-remove-implicit-default-provider
Open

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625
nabinchha wants to merge 3 commits into
mainfrom
nmulepati/fix-590-remove-implicit-default-provider

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha commented May 9, 2026

Summary

Final cleanup after the #589 deprecation cycle. Providers are now a named list referenced explicitly by each ModelConfig: no hidden registry default, no YAML default: flowing into DataDesigner.__init__, and no ambiguity about which provider wins.

The original release gate is now satisfied: v0.6.0 shipped on 2026-05-13 and includes #594, so users have had a release containing the deprecation warnings before this removal lands.

Related Issue

Closes #590. Depends on #588 (merged as #591) and #589 (merged as #594, released in v0.6.0).

Changes

Removed

  • Removed ModelProviderRegistry.default, default-provider validators, and get_default_provider_name() from the engine registry.
  • Removed the config-layer get_default_provider_name() helper.
  • Removed the CLI default-provider field, set_default / get_default service methods, default-tracking in update/delete, the "Change default provider" menu option, and the "Default" column from data-designer config list.
  • Removed the top-level default_provider and per-item configured_provider / effective_provider fields from agent introspection output in favor of a single explicit per-item provider field.

Changed

  • Made ModelConfig.provider required.
  • Changed ModelProviderRegistry.get_provider(name) to require a concrete provider name.
  • Changed resolve_model_provider_registry(providers) to drop default_provider_name.
  • Simplified DataDesigner.__init__ so it resolves providers without consulting YAML defaults.
  • Split the CLI model-form error for "no providers configured" versus "multiple providers configured but none selected."
  • Updated legacy docs and Fern docs to remove the deprecated default-provider workflow, mark provider as required, and call out the upgrade impact for stale model_configs.yaml entries and agent context consumers.

Compatibility

  • Existing model_providers.yaml files with a legacy default: key still load cleanly because both ModelProviderRegistry classes inherit directly from pydantic.BaseModel, preserving pydantic v2's default extra="ignore" behavior. This invariant is now documented on the classes and covered by tests.
  • Existing model_configs.yaml entries that omit provider or set it to null must be updated with an explicit provider name.

Attention Areas

Reviewers should pay special attention to:

Testing

  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev pytest packages/data-designer/tests/cli/forms/test_model_builder.py packages/data-designer-engine/tests/engine/test_model_provider.py packages/data-designer/tests/cli/repositories/test_provider_repository.py packages/data-designer/tests/interface/test_data_designer.py packages/data-designer-config/tests/config/test_models.py packages/data-designer-config/tests/config/test_default_model_settings.py -q (162 passed before the final docs-wording amend)
  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev ruff check ...
  • PYTHONPATH=packages/data-designer-config/src:packages/data-designer-engine/src:packages/data-designer/src uv run --group dev ruff format --check ...
  • Full GitHub Actions on rebased branch: restarted after the final docs-wording amend
  • fern check: not run locally because the fern CLI is not installed in this environment

Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Unit tests added/updated
  • Docs updated

@nabinchha nabinchha requested a review from a team as a code owner May 9, 2026 00:19
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

MkDocs preview: https://aef33b3f.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-625.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Review: PR #625refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

Summary

Final cleanup of the #589 deprecation cycle: drops the implicit "default provider" concept end-to-end. ModelConfig.provider goes from str | None = None to required str; ModelProviderRegistry.default (and both its validators + get_default_provider_name()) is removed; the CLI's set_default/get_default/"Change default provider" workflow is gone; agent introspection drops default_provider/effective_provider. Net −722 lines across 31 files, test suite updated in lockstep (deprecation-warning tests → strict-validation tests). Migration for on-disk YAMLs carrying a legacy default: relies on pydantic v2's extra="ignore" default, locked in by two new tests.

The PR is well-scoped, well-tested, and architecturally clean. The dependency direction (interface → engine → config) is preserved and the two ModelProviderRegistry classes (engine + CLI repo) are updated consistently. CI is green on the Python 3.10–3.12 matrix; 3.13 runs are still in progress at review time.

The primary risk is a hard public-API break, gated on the #594 deprecation cycle having shipped in a release — which the PR author has correctly flagged as a blocker and the release history confirms has not yet happened (last release v0.5.9, 2026-04-28; #594 merged 2026-05-05).

Findings

⛔ Blocker (already called out by author)

⚠️ Breaking changes worth an explicit user-facing note

  • ModelConfig.provider becomes required. Anyone constructing ModelConfig(alias=..., model=...) without provider= now gets ValidationError. Likewise any on-disk ~/.data-designer/model_configs.yaml entry that omitted provider: or serialised it as null will fail to load after upgrade. The PR body notes "Migration is one-line per call site" for code, but on-disk configs from pre-feat(models): deprecate implicit default provider routing #594 users have no migration helper — they will hit ValidationError on load. Consider either (a) adding a CHANGELOG/release-notes entry calling this out, or (b) catching this specific ValidationError in ProviderRepository.load / ModelRepository.load with a targeted error message pointing at the fix. I lean toward (a) since users on stale model configs are a shrinking set and a clear CHANGELOG entry is cheaper than a bespoke migration path.

  • Agent introspection shape change. get_model_aliases_state() drops default_provider from the top level and collapses configured_provider + effective_provider into a single provider field (see packages/data-designer/src/data_designer/cli/utils/agent_introspection.py). This is observable output that agents consuming data-designer agent context parse. The PR description lists this under "Removed" but doesn't flag it as a schema change. If any downstream agent scaffolding keys off those three field names, it will silently produce missing-field errors or empty strings. Worth an explicit callout in release notes alongside the ModelConfig change.

✅ Migration path for legacy YAML default: — correct and well-tested

  • Both ModelProviderRegistry classes (engine model_provider.py and CLI provider_repository.py) inherit from BaseModel directly, not ConfigBase (which overrides extra="forbid"). The PR correctly locks this in:
    • test_registry_ignores_legacy_default_key (engine)
    • test_load_silently_ignores_legacy_default_key (CLI repo)
  • Subtle-but-important invariant: if someone later rebases these classes onto ConfigBase, the migration silently breaks (old YAMLs would start raising ValidationError on the stale default: key). A short comment on the class itself — "inherits BaseModel directly so extra='ignore' lets us drop legacy default: keys" — would protect against this. The docstring on ProviderRepository.load already says this; the class itself doesn't.

🟡 Minor / nits

  • provider_controller.py menu order change. The pre-refactor code appended "exit" after the conditional "change_default" so it always rendered last. The new code inlines "exit" in the initial dict literal, which keeps it last only because there is now nothing else to append. Behaviour is identical today; flagging only because a future "add menu option" patch could accidentally push exit out of its conventional last-position slot. Not worth changing.

  • Unused import cleanup is thoroughwarn_at_caller removed from models.py, default_model_settings.py, model_provider.py, provider_controller.py, provider_repository.py. Good. No orphaned warnings import.

  • scripts/benchmarks/benchmark_engine_v2.py correctly updated to drop default_provider_name=. Nice catch — benchmark scripts are easy to miss.

  • Docs parity: Four docs/concepts/models/*.md files had their deprecation warnings stripped. Consistent. Didn't spot any orphaned #589 references in the docs tree, but a quick grep -r "589" docs/ before merging wouldn't hurt.

  • CLI README.md drops the "default: nvidia" line from the example YAML. Good — keeps user-visible examples aligned with the new schema.

Test coverage observations

  • Previously-added warning-emission tests (test_model_config_provider_none_emits_deprecation_warning, test_explicit_default_emits_deprecation_warning, test_handle_change_default_emits_deprecation_warning, etc.) have been correctly replaced by their post-cycle counterparts (ValidationError-based). No test appears to have been silently dropped.
  • test_init_no_user_providers_loads_from_yaml is the right shape for the remaining DataDesigner.__init__ YAML-fallback path, replacing the older three-test cluster (test_init_user_supplied_providers_ignore_unrelated_yaml_default, test_init_user_supplied_providers_preserve_first_wins_over_yaml_default, test_init_yaml_default_emits_single_deprecation_warning). The first two were bug: default model provider leaks from YAML into DataDesigner constructor #588 regressions — those bugs can't reoccur once the default field is gone, so removal is correct.
  • No new e2e tests, but the PR description's "N/A (no runtime behavior change beyond stricter validation)" is accurate: this is a pure API-surface tightening.

Security / performance

  • No security implications. No credentials handling changed.
  • No performance implications. One fewer model_validator on ModelConfig and three fewer on ModelProviderRegistry — negligible but nominally a micro-win.

Verdict

Approve pending release gate. The code change is clean, the migration strategy for existing YAMLs is sound and tested, and the CI signal is strong. The two items worth addressing before merge are non-blocking:

  1. (Recommended) Add a CHANGELOG / release-notes entry explicitly calling out (a) ModelConfig.provider becoming required, and (b) the agent-introspection schema change. Users upgrading from v0.5.9 → next release with stale on-disk configs or downstream agent tooling will otherwise hit an undocumented break.
  2. (Nit) Drop a one-line comment on both ModelProviderRegistry classes pinning the BaseModel-not-ConfigBase invariant that makes the legacy-default: migration work.

Both are worth doing but neither should block merge once the release gate lifts.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR completes the deprecation cycle started in #589 by fully removing ModelProviderRegistry.default, the config-layer get_default_provider_name(), and all CLI tooling around the default-provider concept. ModelConfig.provider is now a required field.

  • Registry cleanup: ModelProviderRegistry.default and all associated validators, get_default_provider_name(), resolve_model_provider_registry(default_provider_name=...), and get_provider(None) are removed from both the engine and CLI layers; pydantic v2's extra="ignore" silently drops the legacy default: YAML key on load.
  • CLI cleanup: set_default, get_default, the "Change default provider" menu option, and the "Default" column in data-designer config list are removed; model_builder.py now raises distinct errors for the zero-provider vs. multi-provider-no-selection cases.
  • Introspection schema: default_provider, configured_provider, and effective_provider fields are replaced by a single provider field per model-alias item.

Confidence Score: 5/5

Clean removal of a fully-deprecated code path with all call sites updated and backward-compatibility for legacy YAML files preserved via pydantic's extra="ignore" behavior.

Every consumer of the old default-routing API — the engine registry, the config layer, the CLI, and the DataDesigner constructor — has been updated consistently. Legacy on-disk YAML files with a default: key continue to load without errors. All changed test files directly exercise the new required-provider contract and the backward-compat load path. No remaining calls to get_provider(None) or provider=None exist in the production code.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/model_provider.py Removes ModelProviderRegistry.default, all related validators, get_default_provider_name, and the None-accepting overload of get_provider; resolve_model_provider_registry simplified to a one-liner.
packages/data-designer-config/src/data_designer/config/models.py ModelConfig.provider changed from Optional[str] = None to required str; deprecation validator removed.
packages/data-designer/src/data_designer/interface/data_designer.py init simplified: YAML-default consultation, warning suppression block, and library_synthesised_default logic removed; construction now a single resolve_model_provider_registry call.
packages/data-designer/src/data_designer/cli/utils/agent_introspection.py Output schema simplified: default_provider removed, configured_provider/effective_provider replaced by single provider field; usability logic directly reads mc.provider.
packages/data-designer/src/data_designer/cli/forms/model_builder.py build_config now raises distinct ValueError messages for zero-provider vs. multi-provider-no-selection instead of silently setting provider=None.
packages/data-designer/src/data_designer/cli/repositories/provider_repository.py CLI ModelProviderRegistry.default field removed; extra="ignore" behavior documented and tested to silently drop legacy YAML default keys.
packages/data-designer/src/data_designer/cli/services/provider_service.py set_default, get_default removed; update/delete no longer maintain registry.default; registry initialisation uses ModelProviderRegistry(providers=[]) without default.
packages/data-designer/src/data_designer/cli/controllers/provider_controller.py _handle_change_default and its menu entry removed; exit option now always present in the options dict.
packages/data-designer/src/data_designer/cli/utils/agent_text_formatter.py default_provider header line and column_labels aliasing removed from format_model_aliases_text and _format_model_aliases_context; _format_table signature simplified.
packages/data-designer-config/src/data_designer/config/default_model_settings.py get_default_provider_name() and its deprecation warning removed; get_default_providers() unchanged.

Sequence Diagram

sequenceDiagram
    participant User
    participant DataDesigner
    participant resolve_model_provider_registry
    participant ModelProviderRegistry
    participant ModelConfig

    User->>DataDesigner: "__init__(model_providers=...)"
    DataDesigner->>resolve_model_provider_registry: (model_providers)
    resolve_model_provider_registry->>ModelProviderRegistry: "ModelProviderRegistry(providers=model_providers)"
    note over ModelProviderRegistry: No default field — extra="ignore" drops legacy key
    ModelProviderRegistry-->>DataDesigner: registry

    User->>DataDesigner: run / get_model(alias)
    DataDesigner->>ModelConfig: model_config.provider (required str)
    DataDesigner->>ModelProviderRegistry: get_provider(model_config.provider)
    ModelProviderRegistry-->>DataDesigner: ModelProvider
Loading

Reviews (5): Last reviewed commit: "Merge branch 'main' into nmulepati/fix-5..." | Re-trigger Greptile

Comment on lines 294 to +299
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in form_data. When len(self.available_providers) > 1 and no "provider" key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

Suggested change
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
else:
if self.available_providers:
raise ValueError(
"Cannot build ModelConfig: multiple providers are configured but none "
"was specified in form data. Specify provider= explicitly on each ModelConfig."
)
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/forms/model_builder.py
Line: 294-299

Comment:
The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in `form_data`. When `len(self.available_providers) > 1` and no `"provider"` key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

```suggestion
        else:
            if self.available_providers:
                raise ValueError(
                    "Cannot build ModelConfig: multiple providers are configured but none "
                    "was specified in form data. Specify provider= explicitly on each ModelConfig."
                )
            raise ValueError(
                "Cannot build ModelConfig: no provider specified in form data and no "
                "available providers configured. Add a provider first via "
                "`data-designer config providers`."
            )
```

How can I resolve this? If you propose a fix, please make it concise.

model: str
inference_parameters: InferenceParamsT = Field(default_factory=ChatCompletionInferenceParams)
provider: str | None = None
provider: str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude and Codex both caught a migration edge here: once provider is required, an existing model_configs.yaml with one provider-less entry makes ModelRepository.load() return None, so CLI and agent flows can treat the whole model file as missing. Could we add a migration/error path in ModelRepository.load() so users see "add provider to alias X" instead of silently losing the registry?

@andreatgretel
Copy link
Copy Markdown
Contributor

docs/concepts/models/model-configs.md:18

provider | str | No | Reference to the name of the Provider to use ... If not specified, one set as the default provider ...

model-configs.md still says provider is not required and falls back to a default. Since this PR removes that path, this doc should say Required: Yes and drop the default-provider fallback text.

@andreatgretel
Copy link
Copy Markdown
Contributor

Looks like the release gate in the PR body is still active: latest GitHub release is v0.5.9, published before #594 merged, so users have not yet had a released version with the deprecation warning. Assuming that gate still applies, this should wait for the next release before merging.

Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on the migration issue and stale docs noted above. The release gate in the PR body also still looks active.

@nabinchha nabinchha force-pushed the nmulepati/fix-590-remove-implicit-default-provider branch 2 times, most recently from f97431c to eab4026 Compare May 20, 2026 15:52
…g.provider (#590)

Final cleanup after the #589 deprecation cycle. Providers are now a
named list referenced by name end-to-end — no hidden registry default,
no YAML default leaking into DataDesigner construction (#588), and no
ambiguity about which provider wins.

ModelConfig.provider becomes a required str. ModelProviderRegistry.default
and the associated validators, get_default_provider_name() helper, CLI
"Change default provider" workflow, and the "Default" column in
`data-designer config list` are removed. resolve_model_provider_registry()
drops its default_provider_name kwarg, and DataDesigner.__init__ no
longer threads a YAML default through registry construction.

Pydantic v2's extra="ignore" default lets existing on-disk YAMLs with
default: keys load cleanly, so users on the deprecation cycle migrate
without breakage.
@nabinchha nabinchha force-pushed the nmulepati/fix-590-remove-implicit-default-provider branch from eab4026 to 09eeae7 Compare May 20, 2026 15:54
@nabinchha
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review. I pushed an updated branch and believe the comments are now addressed:

  • docs/concepts/models/model-configs.md now marks provider as required and removes the default-provider fallback text. I also made the same update in the Fern docs page.
  • The release gate is now satisfied: v0.6.0 shipped on 2026-05-13 and includes feat(models): deprecate implicit default provider routing #594, so users have had a release containing the deprecation warnings before this removal lands. I updated the PR body accordingly.
  • The breaking-change notes are now explicit in both the PR body and docs: stale model_configs.yaml entries that omit provider or set it to null need an explicit provider name.
  • The agent-context schema change is called out more precisely: consumers should read each model alias item's provider; the top-level default_provider and per-item configured_provider / effective_provider fields are no longer emitted.
  • I added class docstrings on both ModelProviderRegistry classes documenting the BaseModel / extra="ignore" invariant for legacy default: YAML keys.
  • I addressed Greptile's CLI diagnostic note by splitting the ModelFormBuilder.build_config error messages for "no providers configured" versus "multiple providers configured but none selected", with a focused test.

The branch is rebased onto current main. GitHub Actions, docs preview, DCO, and Greptile are all green on the latest commit. Could you re-review when you have a chance?

@nabinchha nabinchha requested a review from andreatgretel May 20, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

2 participants