feat(check-stream): support configurable stream names#1030
feat(check-stream): support configurable stream names#1030devin-ai-integration[bot] wants to merge 8 commits into
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1779315328-configurable-check-streams#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1779315328-configurable-check-streamsPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
|
CI note: the only current failed check is optional Failure detailsThe failed job is https://github.com/airbytehq/airbyte-python-cdk/actions/runs/26193981254/job/77069032809. It failed in an existing JWT test, not in the changed check-stream tests: The same full pytest matrix passed on Targeted local verification still passes for this PR's changed area: The failing JWT assertion appears unrelated to this PR's two touched files ( |
|
Updated the local variable in Re-ran targeted checks successfully: |
|
Description updated to include the mypy check that was run after the rename. |
There was a problem hiding this comment.
Note
This review was AI-authored (Claude Code).
Reviewed the change — implementation is clean and well-tested for the static-streams override path. A few things worth addressing before flipping out of draft:
Blockers / clarifications
- Confirm
airbyte_check_stream_namesis the agreed-upon key with the platform/ADP/Sonar consumers. This is the firstairbyte_*-prefixed reserved config key inairbyte_cdk/sources/declarative/(grep returns no precedent), so it sets a convention. - Verify the key is preserved through spec validation / config migration end-to-end. If the connector's
spec.jsondoesn't declare it, JSON-schema validation upstream may strip or reject it before it reachescheck_connection. An integration-level smoke test would help here.
Suggestions
- Validate the override before calling
source.streams(config=config)(check_stream.py:74vs.108-111). Right now a malformed override still triggers stream discovery before erroring out, which is wasteful and surfaces the validation error late. - Empty-list semantics are non-obvious:
[]silently skips all static stream checks but still requiressource.streams()to return ≥1 stream. If this is intentional (per the PR description), a one-line comment in_get_stream_nameswould help future maintainers. - Include the offending value/type in the validation error to aid debugging:
raise ValueError( f"{CHECK_STREAM_NAMES_CONFIG_KEY} must be a list of strings, " f"got {type(configured_stream_names).__name__}: {configured_stream_names!r}" )
- Add a test for
airbyte_check_stream_names: []combined withdynamic_streams_check_configsset, to lock in the "skip static, still run dynamic" behavior called out in the PR description.
Nit
- New test mocks don't set
stream.availability_strategy = Nonelike the existingtest_check_stream_with_slices_as_listdoes. Works because of MagicMock, but consistency helps.
No concerns on backward compatibility or security — the new path is fully gated on the new key being present, and stream names are dict-looked-up (no injection vector).
|
Thanks for the review — I’m addressing the actionable CDK-side items now: pre-discovery override validation, empty-list documentation, dynamic-stream coverage for empty static overrides, and mock consistency. I’ll leave the platform/Sonar key-contract questions called out separately since those depend on the follow-up Sonar integration path. |
|
I’m also going to keep the validation error deterministic rather than echoing the offending config value, because this value comes from connector config and could contain unexpected sensitive content if misused. I can include the value type only if you still want extra debugging context. |
|
I’ll preserve the value/type omission for now to avoid echoing connector config values into errors, but I’ll keep the error specific to the reserved key. If type-only context is desired, I can add |
|
On the platform contract: I’ll leave this as an explicit PR checklist item unless/until the Sonar follow-up PR exists. The CDK-side behavior remains gated on the key being present, so plain Airbyte behavior falls back to manifest |
|
For the spec validation/config migration question: I’ll add a local CDK smoke-style test around the |
|
Pushed follow-up commit
I intentionally did not echo the configured value in the validation error to avoid including arbitrary connector config content in error messages. The remaining key-name/platform-preservation questions are still captured in the PR checklist for Sonar/platform confirmation. Verified locally:
|
CDK runtime contract smoke testRan a shell-only low-code CDK runtime smoke test against this branch using
Targeted local checksThe temporary smoke-test file was removed after execution, then I reran the PR's targeted checks: CaveatThis validates the CDK-side runtime contract independently of Sonar. It does not validate that the Sonar platform injects Session: https://app.devin.ai/sessions/2fe0f6c5174b40a1842828a49e32f69b |
There was a problem hiding this comment.
Pull request overview
Adds a configuration-based override for declarative CheckStream static stream_names, allowing the platform (or user config) to select which streams are checked via the top-level airbyte_check_stream_names key while preserving existing behavior when the key is absent.
Changes:
- Add
airbyte_check_stream_namesoverride support and validation inCheckStream.check_connection(). - Treat an empty override list as “skip static stream availability checks” while still running dynamic stream checks when configured.
- Add unit tests covering override behavior (valid override, empty list, invalid types, unknown streams, and validation ordering).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
airbyte_cdk/sources/declarative/checks/check_stream.py |
Introduces a config-driven override for static stream names with type validation and empty-list semantics. |
unit_tests/sources/declarative/checks/test_check_stream.py |
Adds tests validating the override behavior and interaction with dynamic stream checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Runtime contract retest after internal key renameSession: https://app.devin.ai/sessions/2fe0f6c5174b40a1842828a49e32f69b I reran the CDK runtime contract smoke test with the renamed internal config key
Targeted local verification
The warnings are from existing experimental/deprecated declarative test fixtures ( CI snapshot after the rename: required checks are passing; one optional |
|
Can you test it out locally by pointing a connector at the local dev CDK and updating the following scenarios to make sure works/fails as expected?
for example: {
"api_key": ...,
"__airbyte_check_stream_names": [ "sites" ]
} |
Local connector validation for requested stream-name scenariosI tested this locally against the PR branch/dev CDK by running a declarative connector through Results:
This confirms the override accepts valid selected stream names, rejects names outside the catalog through the existing catalog-membership error path, and treats an empty list as “skip static stream checks.” |
|
Update after real-connector validation and empty-list correction: Sophie pointed out that an empty list should fall back instead of skipping all static checks. I validated the behavior against a real manifest connector ( Real connector command pattern used from the connector manifest directory: PYTHONPATH=/home/ubuntu/repos/airbyte-python-cdk \
poetry --directory /home/ubuntu/repos/airbyte-python-cdk run source-declarative-manifest check \
--config /home/ubuntu/cdk-connector-validation/source_declarative_manifest/<scenario>.jsonReal connector scenarios after the fix: Local checks after the fix: Pushed commit: |
|
Patrick Nilan (@pnilan) reran the five requested scenarios against a real manifest connector ( Command: cd /home/ubuntu/repos/airbyte-python-cdk
poetry run python /home/ubuntu/cdk-connector-validation/prove_real_connector_stream_checks.pyManifest details: Results: [
{
"scenario": "one valid stream name",
"override": ["per-article"],
"result": "PASS",
"observed": "Status.SUCCEEDED",
"checked_streams": ["per-article"],
"expected_checked_streams": ["per-article"]
},
{
"scenario": "list of valid stream names",
"override": ["per-article", "top"],
"result": "PASS",
"observed": "Status.SUCCEEDED",
"checked_streams": ["per-article", "top"],
"expected_checked_streams": ["per-article", "top"]
},
{
"scenario": "one invalid stream name",
"override": ["missing"],
"result": "PASS",
"observed": "missing is not part of the catalog. Expected one of ['per-article', 'top'].",
"checked_streams": [],
"expected_checked_streams": []
},
{
"scenario": "list of valid stream names with at least one invalid stream name",
"override": ["per-article", "missing"],
"result": "PASS",
"observed": "missing is not part of the catalog. Expected one of ['per-article', 'top'].",
"checked_streams": ["per-article"],
"expected_checked_streams": ["per-article"]
},
{
"scenario": "empty list of names",
"override": [],
"result": "PASS",
"observed": "Status.SUCCEEDED",
"checked_streams": ["per-article"],
"expected_checked_streams": ["per-article"]
}
]This confirms the first two scenarios checked exactly the specified streams. It also confirms empty list now falls back to manifest |
Summary
Adds support for overriding declarative
CheckStreamstream names from connector config using the internal top-level__airbyte_check_stream_nameskey. When the key is absent or explicitly an empty list, existing manifeststream_namesbehavior is preserved; when present with one or more names, the value must be a list of strings and the selected names still go through the existing catalog membership and availability checks.Follow-up review updates now validate the override before stream discovery, reject explicit
nulloverrides instead of treating them as absent, use the existing double-underscore convention for platform-injected/internal config fields, and make empty-list override behavior fall back to manifeststream_namesbased on real connector validation.Review & Testing Checklist for Human
__airbyte_check_stream_namesis the intended internal platform-injected config key for ADP/Sonar integration.stream_namesbehavior, matching absent-key behavior.stream_namesincludes an unselected stream and whose source config provides selected stream names through__airbyte_check_stream_names.Notes
Local checks run:
poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.pypoetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.pypoetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.pypoetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -qAdditional CDK runtime contract smoke tests run locally:
__airbyte_check_stream_names: ["selected_stream"]checked the selected stream and skipped the manifest static stream; absent override key fell back to the manifest static stream and failed on the expected mocked 403.source-wikipedia-pageviewsthrough localsource-declarative-manifestusing this branch's CDK with absent, empty-list, single-valid, multi-valid, single-invalid, and mixed-invalid overrides.missing is not part of the catalog. Expected one of ['per-article', 'top'].Link to Devin session: https://app.devin.ai/sessions/2fe0f6c5174b40a1842828a49e32f69b