Skip to content

Replace sed with ruamel.yaml for packages.yml version bumps#2148

Merged
haritamar merged 2 commits into
masterfrom
devin/1773346628-fix-bump-version-yaml
Jun 23, 2026
Merged

Replace sed with ruamel.yaml for packages.yml version bumps#2148
haritamar merged 2 commits into
masterfrom
devin/1773346628-fix-bump-version-yaml

Conversation

@devin-ai-integration

@devin-ai-integration devin-ai-integration Bot commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Summary

The sed-based version replacement in bump-version.yml only matched version: X.Y.Z patterns, so when master had a git hash reference (e.g. - git: ... dbt-data-reliability.git + revision: <hash>), the version bump silently did nothing. It also matched commented-out lines.

This PR replaces the sed command for packages.yml with a Python script using ruamel.yaml (round-trip YAML parser). Unlike the previous PyYAML-based approach, ruamel.yaml preserves comments, quotes, flow-style sequences, and indentation — producing a true one-line diff on a normal release:

-    version: 0.25.0
+    version: 1.2.3

The script handles both paths:

  • Normal release: updates version in-place on the existing elementary-data/elementary entry
  • Git-hash → version swap: replaces a - git: .../dbt-data-reliability.git + revision: ... entry with a - package: elementary-data/elementary + version: ... entry, carrying over trailing comments

ruamel.yaml is pinned to ==0.18.10 for deterministic builds. The dbt deps --lock step from master and the existing sed for pyproject.toml are preserved as-is.

Link to Devin session: https://app.devin.ai/sessions/0375709f72a94fdda7ce6654be1bb8b1
Requested by: @haritamar

@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions

Copy link
Copy Markdown
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

@coderabbitai

coderabbitai Bot commented Mar 12, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@devin-ai-integration[bot], we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 11 minutes and 38 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee9e9f81-d171-48e3-ae79-df25062d27bd

📥 Commits

Reviewing files that changed from the base of the PR and between e69aaab and 7d9ca1d.

📒 Files selected for processing (2)
  • .github/scripts/bump_packages_version.py
  • .github/workflows/bump-version.yml
📝 Walkthrough

Walkthrough

Adds a Python script to update the elementary package version in packages.yml (replacing legacy dbt-data-reliability entries if present) and refactors the GitHub Actions workflow to run that script, using a computed package version passed via PKG_VERSION.

Changes

Cohort / File(s) Summary
Version Management Script
.github/scripts/bump_packages_version.py
New Python script exposing bump_packages_version(version: str) that reads ./elementary/monitor/dbt_project/packages.yml, updates or replaces the elementary-data/elementary package entry (or errors if absent), preserves YAML structure, appends helper comments with the given version, and requires PKG_VERSION.
CI Workflow
.github/workflows/bump-version.yml
Workflow refactored to add setup-python and PyYAML install, consolidate version selection into a "Determine package version" step, emit the chosen version to GITHUB_OUTPUT, and run the Python script with PKG_VERSION to update packages.yml. Existing pyproject.toml bump and commit/push steps remain.

Sequence Diagram(s)

sequenceDiagram
  participant GH as "GitHub Actions"
  participant Runner as "Action runner (env)"
  participant Python as "bump_packages_version.py"
  participant YAML as "packages.yml (repo)"
  participant Git as "git commit/push"

  GH->>Runner: start workflow
  Runner->>Runner: Determine package version\n(write to GITHUB_OUTPUT)
  Runner->>Python: export PKG_VERSION\nrun script
  Python->>YAML: read `packages.yml`
  Python->>YAML: update/replace elementary entry\nappend helper comments
  Python-->>Runner: exit 0 / success message
  Runner->>Git: commit & push updated `packages.yml`
  Git-->>GH: push confirmed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through YAML lines tonight,
Replaced an old tag with version bright,
The workflow hums, the repo sings,
Commits roll in on nimble wings,
Little script, big cheer — hop on, delight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Replace sed with ruamel.yaml for packages.yml version bumps' directly reflects the main change: switching from sed-based regex replacement to a Python YAML parsing approach using ruamel.yaml for updating packages.yml versions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devin/1773346628-fix-bump-version-yaml

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/bump_packages_version.py:
- Around line 19-20: The script currently uses PyYAML's yaml.safe_load() and
yaml.dump(), which strip comments from PACKAGES_FILE during the
read/modify/write round-trip; replace this with a round-trip YAML API that
preserves comments (e.g., ruamel.yaml's YAML() loader/dumper) so comments and
formatting in PACKAGES_FILE are retained when reading into the variable
currently named data and when writing back (the code paths that call yaml.dump
at the end of the script). Update imports and replace yaml.safe_load(...) /
yaml.dump(...) usage with the ruamel YAML().load(...) and YAML().dump(...)
workflow (or an equivalent round-trip-preserving YAML library) while ensuring
the same data modifications still target the in-memory variable (data) and are
written back to the same file.

In @.github/workflows/bump-version.yml:
- Around line 70-77: Update the "Determine package version" step (id:
pkg-version) to fail when a user-supplied dbt-package-version override is
present but validation produced an empty validated-dbt-package-version: check
whether the override (the dbt-package-version
input/needs.validate-version.outputs.requested-dbt-package-version or
equivalent) is non-empty and the
needs.validate-version.outputs.validated-dbt-package-version is empty, and if so
echo an error and exit non‑zero instead of falling back to
default-dbt-package-version; otherwise continue to set version to the validated
value or default as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 122bfa05-a351-4733-a09d-c2486fa51890

📥 Commits

Reviewing files that changed from the base of the PR and between 2f98ef7 and 8229635.

📒 Files selected for processing (2)
  • .github/scripts/bump_packages_version.py
  • .github/workflows/bump-version.yml

Comment thread .github/scripts/bump_packages_version.py Outdated
Comment thread .github/workflows/bump-version.yml Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/bump-version.yml (1)

68-69: Pin PyYAML to a vetted version for deterministic releases.

Installing pyyaml without a version can make this release workflow non-reproducible and brittle over time.

Proposed change
-      - name: Install PyYAML
-        run: pip install pyyaml
+      - name: Install PyYAML
+        run: python -m pip install "PyYAML==<vetted-version>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/bump-version.yml around lines 68 - 69, Update the "Install
PyYAML" workflow step to install a specific, vetted PyYAML release instead of
the unpinned package; change the pip command in the Install PyYAML step to pin
to a known good version (for example pyyaml==6.0) so the release workflow is
deterministic and reproducible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/bump-version.yml:
- Around line 68-69: Update the "Install PyYAML" workflow step to install a
specific, vetted PyYAML release instead of the unpinned package; change the pip
command in the Install PyYAML step to pin to a known good version (for example
pyyaml==6.0) so the release workflow is deterministic and reproducible.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97eb3bf2-b98f-4c4b-b605-4cf25c71eb0a

📥 Commits

Reviewing files that changed from the base of the PR and between 8229635 and e69aaab.

📒 Files selected for processing (1)
  • .github/workflows/bump-version.yml

@themavik themavik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes — the implementation looks correct and addresses the reported issue well.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request is stale because it has been open for too long with no activity.
If you would like the pull request to remain open, please remove the stale label or leave a comment.

@github-actions github-actions Bot added the Stale label May 24, 2026
@haritamar

Copy link
Copy Markdown
Collaborator

Suggestion: use ruamel.yaml (round-trip) instead of PyYAML for the surgical edit

The YAML-parsing direction is right, but yaml.dump() is lossy — it re-serializes the whole file, so it reformats the unrelated dbt_utils constraint [">=0.8.0", "<0.9.0"] into a block list, re-indents everything, and drops all comments (which is why the script has to re-append a hardcoded HELPER_COMMENTS block). ruamel.yaml parses the YAML properly and preserves quotes, flow-style, indentation, and comments, giving a true one-line diff on a normal release and a clean 2-line swap on the git-hash path — with no comment block to maintain in the script.

Swap pip install pyyamlpip install ruamel.yaml, and:

import os
import sys

from ruamel.yaml import YAML
from ruamel.yaml.comments import CommentedMap

PACKAGES_FILE = "./elementary/monitor/dbt_project/packages.yml"


def bump_packages_version(version: str) -> None:
    yaml = YAML()
    yaml.preserve_quotes = True
    yaml.indent(mapping=2, sequence=4, offset=2)  # match the file's existing style

    with open(PACKAGES_FILE) as f:
        data = yaml.load(f)

    packages = data.get("packages") or []
    elementary_found = False
    for i, pkg in enumerate(packages):
        if "git" in pkg and "dbt-data-reliability" in pkg["git"]:
            # Convert the active git-hash entry into a versioned package entry,
            # carrying over any comment ruamel anchored to the old node's last key.
            new_pkg = CommentedMap()
            new_pkg["package"] = "elementary-data/elementary"
            new_pkg["version"] = version
            for _key, comment in pkg.ca.items.items():
                new_pkg.ca.items["version"] = comment
            packages[i] = new_pkg
            elementary_found = True
            break
        if pkg.get("package") == "elementary-data/elementary":
            pkg["version"] = version
            elementary_found = True
            break

    if not elementary_found:
        print(
            "::error::Could not find elementary-data/elementary or "
            "dbt-data-reliability entry in packages.yml"
        )
        sys.exit(1)

    with open(PACKAGES_FILE, "w") as f:
        yaml.dump(data, f)

    print(f"Updated packages.yml to version {version}")


if __name__ == "__main__":
    version = os.environ.get("PKG_VERSION", "")
    if not version:
        print("::error::PKG_VERSION environment variable is not set")
        sys.exit(1)
    bump_packages_version(version)

A few other things to fold in while rebasing (the branch is currently conflicting with master):

  1. Rebase onto current master and keep dbt deps --lock. master's package step now runs dbt deps --lock --project-dir elementary/monitor/dbt_project after the version edit to regenerate the committed package-lock.yml. This branch predates that, so make sure the new Python step is followed by dbt deps --lock — otherwise the lock file goes stale against the bumped version. master has also moved to a peter-evans/create-pull-request flow, so the Commit/Push steps here will need reconciling.
  2. Avoid shell injection in the "Determine package version" step. ${{ inputs.dbt-package-version }} (and the validated/default outputs) are interpolated straight into the run: block; the rest of the workflow routes inputs through env: and reads $VAR to avoid exactly this. Worth matching that pattern here.
  3. Minor: break once the elementary entry is found (there's only one), and note that master's validate-version job already has a fail on invalid input step — so confirm the new in-step guard isn't redundant after the rebase.

@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

❌ Cannot revive Devin session - the session is too old. Please start a new session instead.

The sed-based version replacement in bump-version.yml had two issues:
1. It only matched 'version: X.Y.Z' patterns, missing the case where
   master has a git hash reference for dbt-data-reliability
2. It also matched commented-out version lines

Replace the sed command for packages.yml with a Python script using
ruamel.yaml (round-trip YAML parser) that preserves comments, quotes,
flow-style, and indentation. The script handles both cases:
- Updates an existing package version reference (one-line diff)
- Replaces a git hash reference with a proper package version

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration devin-ai-integration Bot force-pushed the devin/1773346628-fix-bump-version-yaml branch from e69aaab to 933779a Compare June 23, 2026 11:47
@devin-ai-integration devin-ai-integration Bot changed the title Fix bump-version to use YAML parsing for packages.yml updates Replace sed with ruamel.yaml for packages.yml version bumps Jun 23, 2026
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

@haritamar Done — rebased on master and switched to ruamel.yaml as you suggested.

The script now produces a true one-line diff for normal releases and preserves all comments, quotes, flow-style sequences ([">=0.8.0", "<0.9.0"]), and indentation. For the git-hash → version swap, trailing comments are carried over to the new entry.

Other changes from the rebase:

  • Dropped the "invalid version guard" commit since master already has that in validate-versionfail on invalid input
  • Kept the dbt deps --lock step that master added
  • Pinned ruamel.yaml==0.18.10 for deterministic builds

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@haritamar haritamar merged commit ab917b2 into master Jun 23, 2026
26 checks passed
@haritamar haritamar deleted the devin/1773346628-fix-bump-version-yaml branch June 23, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants