Skip to content

feat(check-stream): support configurable stream names#1030

Open
devin-ai-integration[bot] wants to merge 8 commits into
mainfrom
devin/1779315328-configurable-check-streams
Open

feat(check-stream): support configurable stream names#1030
devin-ai-integration[bot] wants to merge 8 commits into
mainfrom
devin/1779315328-configurable-check-streams

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 20, 2026

Summary

Adds support for overriding declarative CheckStream stream names from connector config using the internal top-level __airbyte_check_stream_names key. When the key is absent or explicitly an empty list, existing manifest stream_names behavior is preserved; when present with one or more names, the value must be a list of strings and the selected names still go through the existing catalog membership and availability checks.

Follow-up review updates now validate the override before stream discovery, reject explicit null overrides instead of treating them as absent, use the existing double-underscore convention for platform-injected/internal config fields, and make empty-list override behavior fall back to manifest stream_names based on real connector validation.

Review & Testing Checklist for Human

  • Verify __airbyte_check_stream_names is the intended internal platform-injected config key for ADP/Sonar integration.
  • Confirm empty override list should fall back to manifest stream_names behavior, matching absent-key behavior.
  • Validate with a low-code connector whose manifest stream_names includes an unselected stream and whose source config provides selected stream names through __airbyte_check_stream_names.

Notes

Local checks run:

  • poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
  • poetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
  • poetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.py
  • poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -q

Additional CDK runtime contract smoke tests run locally:

  • Synthetic contract test: override config __airbyte_check_stream_names: ["selected_stream"] checked the selected stream and skipped the manifest static stream; absent override key fell back to the manifest static stream and failed on the expected mocked 403.
  • Real connector test: ran source-wikipedia-pageviews through local source-declarative-manifest using this branch's CDK with absent, empty-list, single-valid, multi-valid, single-invalid, and mixed-invalid overrides.
  • Real connector result: absent key and empty list both succeeded via manifest fallback; valid overrides succeeded; invalid overrides failed with missing is not part of the catalog. Expected one of ['per-article', 'top'].

Link to Devin session: https://app.devin.ai/sessions/2fe0f6c5174b40a1842828a49e32f69b

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1779315328-configurable-check-streams#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1779315328-configurable-check-streams

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

PyTest Results (Fast)

4 079 tests  +10   4 068 ✅ +10   7m 49s ⏱️ +5s
    1 suites ± 0      11 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 7e52e79. ± Comparison against base commit f67a9d9.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

PyTest Results (Full)

4 082 tests  +10   4 070 ✅ +10   11m 25s ⏱️ +28s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 7e52e79. ± Comparison against base commit f67a9d9.

♻️ This comment has been updated with latest results.

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

CI note: the only current failed check is optional Pytest (All, Python 3.13, Ubuntu).

Failure details

The failed job is https://github.com/airbytehq/airbyte-python-cdk/actions/runs/26193981254/job/77069032809.

It failed in an existing JWT test, not in the changed check-stream tests:

FAILED unit_tests/sources/declarative/auth/test_jwt.py::TestJwtAuthenticator::test_get_signed_token - AssertionError: assert '***' == '***'

The same full pytest matrix passed on main yesterday: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/26100771442.

Targeted local verification still passes for this PR's changed area:

poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -q
poetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
poetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.py

The failing JWT assertion appears unrelated to this PR's two touched files (airbyte_cdk/sources/declarative/checks/check_stream.py and unit_tests/sources/declarative/checks/test_check_stream.py).


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Updated the local variable in _get_stream_names from stream_names to configured_stream_names for clarity and pushed commit 344d0db6.

Re-ran targeted checks successfully:

poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
poetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
poetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.py
poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -q

Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Description updated to include the mypy check that was run after the rename.


Devin session

Copy link
Copy Markdown
Contributor

@sophiecuiy sophiecuiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This review was AI-authored (Claude Code).

Reviewed the change — implementation is clean and well-tested for the static-streams override path. A few things worth addressing before flipping out of draft:

Blockers / clarifications

  • Confirm airbyte_check_stream_names is the agreed-upon key with the platform/ADP/Sonar consumers. This is the first airbyte_*-prefixed reserved config key in airbyte_cdk/sources/declarative/ (grep returns no precedent), so it sets a convention.
  • Verify the key is preserved through spec validation / config migration end-to-end. If the connector's spec.json doesn't declare it, JSON-schema validation upstream may strip or reject it before it reaches check_connection. An integration-level smoke test would help here.

Suggestions

  • Validate the override before calling source.streams(config=config) (check_stream.py:74 vs. 108-111). Right now a malformed override still triggers stream discovery before erroring out, which is wasteful and surfaces the validation error late.
  • Empty-list semantics are non-obvious: [] silently skips all static stream checks but still requires source.streams() to return ≥1 stream. If this is intentional (per the PR description), a one-line comment in _get_stream_names would help future maintainers.
  • Include the offending value/type in the validation error to aid debugging:
    raise ValueError(
        f"{CHECK_STREAM_NAMES_CONFIG_KEY} must be a list of strings, "
        f"got {type(configured_stream_names).__name__}: {configured_stream_names!r}"
    )
  • Add a test for airbyte_check_stream_names: [] combined with dynamic_streams_check_configs set, to lock in the "skip static, still run dynamic" behavior called out in the PR description.

Nit

  • New test mocks don't set stream.availability_strategy = None like the existing test_check_stream_with_slices_as_list does. Works because of MagicMock, but consistency helps.

No concerns on backward compatibility or security — the new path is fully gated on the new key being present, and stream names are dict-looked-up (no injection vector).

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Thanks for the review — I’m addressing the actionable CDK-side items now: pre-discovery override validation, empty-list documentation, dynamic-stream coverage for empty static overrides, and mock consistency. I’ll leave the platform/Sonar key-contract questions called out separately since those depend on the follow-up Sonar integration path.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

I’m also going to keep the validation error deterministic rather than echoing the offending config value, because this value comes from connector config and could contain unexpected sensitive content if misused. I can include the value type only if you still want extra debugging context.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

I’ll preserve the value/type omission for now to avoid echoing connector config values into errors, but I’ll keep the error specific to the reserved key. If type-only context is desired, I can add got <type> without including repr(value).


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

On the platform contract: I’ll leave this as an explicit PR checklist item unless/until the Sonar follow-up PR exists. The CDK-side behavior remains gated on the key being present, so plain Airbyte behavior falls back to manifest stream_names.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

For the spec validation/config migration question: I’ll add a local CDK smoke-style test around the check_connection contract in this PR. The real platform preservation path still needs to be verified in the Sonar PR that injects the key, because Airbyte platform does not populate it today.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Pushed follow-up commit 54db4058 addressing the CDK-side review items:

  • validates airbyte_check_stream_names before source.streams(config=config)
  • documents empty-list static-check semantics inline
  • adds coverage that empty static overrides still run configured dynamic stream checks
  • adds coverage that malformed overrides do not trigger stream discovery
  • aligns new stream mocks with availability_strategy = None

I intentionally did not echo the configured value in the validation error to avoid including arbitrary connector config content in error messages. The remaining key-name/platform-preservation questions are still captured in the PR checklist for Sonar/platform confirmation.

Verified locally:

  • poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
  • poetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
  • poetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.py
  • poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -q

Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

CDK runtime contract smoke test

Ran a shell-only low-code CDK runtime smoke test against this branch using ConcurrentDeclarativeSource.check() with mocked HTTP declarative streams.

  • Passed: with config airbyte_check_stream_names: ["selected_stream"], check returned Status.SUCCEEDED.
  • Passed: override path called https://api.test.com/selected exactly once.
  • Passed: override path called manifest-declared https://api.test.com/manifest exactly zero times.
  • Passed: with airbyte_check_stream_names absent, check returned Status.FAILED.
  • Passed: fallback path message contained Forbidden. You don't have permission to access this resource.
  • Passed: fallback path called https://api.test.com/manifest exactly once.
  • Passed: fallback path called https://api.test.com/selected exactly zero times.
$ poetry run pytest unit_tests/sources/declarative/checks/test_check_stream_contract_e2e_tmp.py -q

test_configured_check_stream_names_override_manifest_stream_names_at_runtime PASSED [ 50%]
test_check_stream_names_fall_back_to_manifest_when_override_is_absent PASSED [100%]

======================== 2 passed, 11 warnings in 1.01s ========================
Targeted local checks

The temporary smoke-test file was removed after execution, then I reran the PR's targeted checks:

$ poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
2 files already formatted

$ poetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
All checks passed!

$ poetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.py
Success: no issues found in 1 source file

$ poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -q
====================== 42 passed, 257 warnings in 11.19s ======================
Caveat

This validates the CDK-side runtime contract independently of Sonar. It does not validate that the Sonar platform injects airbyte_check_stream_names; that remains a Sonar-side integration test once the Sonar PR exists.

Session: https://app.devin.ai/sessions/2fe0f6c5174b40a1842828a49e32f69b

@sophiecuiy sophiecuiy marked this pull request as ready for review May 21, 2026 18:32
Copilot AI review requested due to automatic review settings May 21, 2026 18:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configuration-based override for declarative CheckStream static stream_names, allowing the platform (or user config) to select which streams are checked via the top-level airbyte_check_stream_names key while preserving existing behavior when the key is absent.

Changes:

  • Add airbyte_check_stream_names override support and validation in CheckStream.check_connection().
  • Treat an empty override list as “skip static stream availability checks” while still running dynamic stream checks when configured.
  • Add unit tests covering override behavior (valid override, empty list, invalid types, unknown streams, and validation ordering).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
airbyte_cdk/sources/declarative/checks/check_stream.py Introduces a config-driven override for static stream names with type validation and empty-list semantics.
unit_tests/sources/declarative/checks/test_check_stream.py Adds tests validating the override behavior and interaction with dynamic stream checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread airbyte_cdk/sources/declarative/checks/check_stream.py Outdated
Comment thread airbyte_cdk/sources/declarative/checks/check_stream.py Outdated
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Runtime contract retest after internal key rename

Session: https://app.devin.ai/sessions/2fe0f6c5174b40a1842828a49e32f69b

I reran the CDK runtime contract smoke test with the renamed internal config key __airbyte_check_stream_names using a static-only declarative manifest with two mocked streams:

  • Passed: override config __airbyte_check_stream_names: ["selected_stream"] returned Status.SUCCEEDED, called /selected exactly once, and called /static zero times.
  • Passed: absent override key returned Status.FAILED, message contained HTTP Status Code: 403, called /static exactly once, and called /selected zero times.
  • Passed: explicit __airbyte_check_stream_names: null raised ValueError: __airbyte_check_stream_names must be a list of strings. before source.streams() was called.
Targeted local verification
  • Passed: poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
  • Passed: poetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
  • Passed: poetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.py
  • Passed: poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -q (43 passed, 257 warnings)

The warnings are from existing experimental/deprecated declarative test fixtures (HttpRequester.url_base / path), not from the override assertion failures.

CI snapshot after the rename: required checks are passing; one optional source-shopify connector check is still pending.

@pnilan
Copy link
Copy Markdown
Contributor

Can you test it out locally by pointing a connector at the local dev CDK and updating the following scenarios to make sure works/fails as expected?

  • one valid stream name
  • list of valid stream names
  • one invalid stream name
  • list of valid stream names w/ at least on invalid stream name
  • empty list of names

for example:

{
    "api_key":  ...,
    "__airbyte_check_stream_names": [ "sites" ]
}

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Local connector validation for requested stream-name scenarios

I tested this locally against the PR branch/dev CDK by running a declarative connector through ConcurrentDeclarativeSource.check() with three available streams (sites, users, accounts) and config-provided __airbyte_check_stream_names overrides.

Results:

  • Passed: one valid stream name, {"__airbyte_check_stream_names": ["sites"]}Status.SUCCEEDED; checked streams: sites.
  • Passed: list of valid stream names, {"__airbyte_check_stream_names": ["sites", "users"]}Status.SUCCEEDED; checked streams: sites, users.
  • Passed: one invalid stream name, {"__airbyte_check_stream_names": ["missing"]}ValueError: missing is not part of the catalog. Expected one of ['sites', 'users', 'accounts'].; no stream availability checks ran.
  • Passed: list of valid names with at least one invalid, {"__airbyte_check_stream_names": ["sites", "missing"]}ValueError: missing is not part of the catalog. Expected one of ['sites', 'users', 'accounts'].; sites was checked before the invalid missing entry was rejected.
  • Passed: empty list, {"__airbyte_check_stream_names": []}Status.SUCCEEDED; no static stream availability checks ran.

This confirms the override accepts valid selected stream names, rejects names outside the catalog through the existing catalog-membership error path, and treats an empty list as “skip static stream checks.”


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Update after real-connector validation and empty-list correction:

Sophie pointed out that an empty list should fall back instead of skipping all static checks. I validated the behavior against a real manifest connector (source-wikipedia-pageviews) through local source-declarative-manifest, then updated this PR so __airbyte_check_stream_names: [] now falls back to manifest check.stream_names, matching the absent-key behavior.

Real connector command pattern used from the connector manifest directory:

PYTHONPATH=/home/ubuntu/repos/airbyte-python-cdk \
  poetry --directory /home/ubuntu/repos/airbyte-python-cdk run source-declarative-manifest check \
  --config /home/ubuntu/cdk-connector-validation/source_declarative_manifest/<scenario>.json

Real connector scenarios after the fix:

source-wikipedia-pageviews manifest check.stream_names: [per-article]
catalog streams: [per-article, top]

absent key: SUCCEEDED
empty list (__airbyte_check_stream_names: []): SUCCEEDED via manifest fallback
one valid (["per-article"]): SUCCEEDED
list valid (["per-article", "top"]): SUCCEEDED
one invalid (["missing"]): failed with "missing is not part of the catalog. Expected one of ['per-article', 'top']."
mixed invalid (["per-article", "missing"]): failed with "missing is not part of the catalog. Expected one of ['per-article', 'top']."

Local checks after the fix:

poetry run ruff format --check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
2 files already formatted

poetry run ruff check airbyte_cdk/sources/declarative/checks/check_stream.py unit_tests/sources/declarative/checks/test_check_stream.py
All checks passed!

poetry run mypy --config-file mypy.ini airbyte_cdk/sources/declarative/checks/check_stream.py
Success: no issues found in 1 source file

poetry run pytest unit_tests/sources/declarative/checks/test_check_stream.py -q
43 passed

Pushed commit: 7e52e794 fix(check-stream): fall back on empty stream override

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Patrick Nilan (@pnilan) reran the five requested scenarios against a real manifest connector (source-wikipedia-pageviews) using this PR branch's local CDK, and added stream-specific instrumentation so the output proves which stream availability checks were invoked.

Command:

cd /home/ubuntu/repos/airbyte-python-cdk
poetry run python /home/ubuntu/cdk-connector-validation/prove_real_connector_stream_checks.py

Manifest details:

connector: source-wikipedia-pageviews
manifest check.stream_names: [per-article]
catalog streams: [per-article, top]

Results:

[
  {
    "scenario": "one valid stream name",
    "override": ["per-article"],
    "result": "PASS",
    "observed": "Status.SUCCEEDED",
    "checked_streams": ["per-article"],
    "expected_checked_streams": ["per-article"]
  },
  {
    "scenario": "list of valid stream names",
    "override": ["per-article", "top"],
    "result": "PASS",
    "observed": "Status.SUCCEEDED",
    "checked_streams": ["per-article", "top"],
    "expected_checked_streams": ["per-article", "top"]
  },
  {
    "scenario": "one invalid stream name",
    "override": ["missing"],
    "result": "PASS",
    "observed": "missing is not part of the catalog. Expected one of ['per-article', 'top'].",
    "checked_streams": [],
    "expected_checked_streams": []
  },
  {
    "scenario": "list of valid stream names with at least one invalid stream name",
    "override": ["per-article", "missing"],
    "result": "PASS",
    "observed": "missing is not part of the catalog. Expected one of ['per-article', 'top'].",
    "checked_streams": ["per-article"],
    "expected_checked_streams": ["per-article"]
  },
  {
    "scenario": "empty list of names",
    "override": [],
    "result": "PASS",
    "observed": "Status.SUCCEEDED",
    "checked_streams": ["per-article"],
    "expected_checked_streams": ["per-article"]
  }
]

This confirms the first two scenarios checked exactly the specified streams. It also confirms empty list now falls back to manifest check.stream_names and checks per-article.


Devin session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants