Skip to content

feat(sources): generate .git_archival.txt for setuptools-scm builds#962

Open
LalatenduMohanty wants to merge 1 commit intopython-wheel-build:mainfrom
LalatenduMohanty:961_git-archival-for-setuptools-scm
Open

feat(sources): generate .git_archival.txt for setuptools-scm builds#962
LalatenduMohanty wants to merge 1 commit intopython-wheel-build:mainfrom
LalatenduMohanty:961_git-archival-for-setuptools-scm

Conversation

@LalatenduMohanty
Copy link
Copy Markdown
Member

@LalatenduMohanty LalatenduMohanty commented Mar 13, 2026

When building from an sdist that lacks .git metadata, setuptools-scm
cannot determine the package version. This adds a .git_archival.txt
file with the correct version so setuptools-scm can resolve it.

  • Skips packages with .git directories (no fix needed)
  • Replaces existing files that have unprocessed placeholders or
    missing fields
  • Creates a new file only when PKG-INFO is also absent (git clones
    or custom downloads, not PyPI sdists)

Closes: #961

@LalatenduMohanty LalatenduMohanty requested a review from a team as a code owner March 13, 2026 11:06
@mergify mergify bot added the ci label Mar 13, 2026
@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch 2 times, most recently from 779259f to 87d777b Compare March 13, 2026 12:37
@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch 4 times, most recently from 545b0a5 to 8287e3c Compare March 19, 2026 12:11
@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch 2 times, most recently from d7d9adf to 811b9e9 Compare March 24, 2026 02:22
@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch from 811b9e9 to 23827f7 Compare March 25, 2026 12:30
@LalatenduMohanty
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds ensure_git_archival(version, target_dir) to src/fromager/sources.py and invokes it from default_build_sdist() when creating an sdist and no .git metadata is present (checked in build_dir and, when different, sdist_root_dir). The helper inspects target_dir/.git_archival.txt: if the file is absent it returns None and does not create one; if present but contains unprocessed $Format: markers or is missing/has empty required fields, it overwrites the file with a synthesized archival payload using a zeroed 40-hex node and describe-name derived from the provided version and returns False; if the file is valid it leaves it unchanged and returns True. Tests in tests/test_sources.py cover absence, replacement of templates, preservation of valid files, and repair of truncated/empty-field files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding git_archival.txt generation for setuptools-scm builds.
Linked Issues check ✅ Passed The implementation fully addresses issue #961: ensure_git_archival() generates .git_archival.txt with describe-name field, preserves valid files, replaces unprocessed templates, and integrates with sdist creation logic.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing ensure_git_archival() and its tests; no unrelated modifications to other parts of the codebase.
Description check ✅ Passed The pull request description directly addresses the changeset, explaining the problem (setuptools-scm cannot determine version without .git metadata), the solution (adding .git_archival.txt), and key behaviors (skipping packages with .git, replacing invalid files).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/fromager/sources.py (1)

693-697: ⚠️ Potential issue | 🟠 Major

Check the source root for git metadata before synthesizing archival data.

This guard only looks at build_dir/.git. Packages with a nested build_dir keep git metadata at sdist_root_dir, so this path will still write a synthetic .git_archival.txt for a real checkout.

Proposed fix
-    if not build_dir.joinpath(".git").exists():
+    has_git_metadata = (
+        build_dir.joinpath(".git").exists()
+        or sdist_root_dir.joinpath(".git").exists()
+    )
+    if not has_git_metadata:
         ensure_git_archival(
             version=version,
             target_dir=build_dir,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fromager/sources.py` around lines 693 - 697, The current guard only
checks build_dir/.git before calling ensure_git_archival, which can synthesize
.git_archival.txt for projects where the real git metadata lives at
sdist_root_dir; update the condition to check for git metadata in the source
root as well (e.g., verify sdist_root_dir.joinpath(".git").exists() or similar)
and only call ensure_git_archival(version=version, target_dir=build_dir) when
neither build_dir nor sdist_root_dir contains a .git directory; reference the
build_dir and sdist_root_dir variables and the ensure_git_archival function to
locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/sources.py`:
- Around line 818-823: The code returns True for any existing .git_archival.txt
that lacks the unprocessed marker, but doesn't validate the archival payload, so
empty/truncated files can be treated as valid; update the check in the block
that reads archival_file to validate the content contains the required
setuptools-scm fields (e.g., "node" and/or "describe-name" entries or a
non-empty node value) before returning True: if those keys/values are missing,
log a warning and proceed as if the file is unprocessed (so replacement
happens). Refer to archival_file, _UNPROCESSED_ARCHIVAL_MARKER and target_dir
when locating and updating the logic.

---

Duplicate comments:
In `@src/fromager/sources.py`:
- Around line 693-697: The current guard only checks build_dir/.git before
calling ensure_git_archival, which can synthesize .git_archival.txt for projects
where the real git metadata lives at sdist_root_dir; update the condition to
check for git metadata in the source root as well (e.g., verify
sdist_root_dir.joinpath(".git").exists() or similar) and only call
ensure_git_archival(version=version, target_dir=build_dir) when neither
build_dir nor sdist_root_dir contains a .git directory; reference the build_dir
and sdist_root_dir variables and the ensure_git_archival function to locate the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b2969c1-cb4c-4d05-a87a-8882f2a5bd6e

📥 Commits

Reviewing files that changed from the base of the PR and between 0a941fd and 23827f7.

📒 Files selected for processing (2)
  • src/fromager/sources.py
  • tests/test_sources.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/sources.py`:
- Around line 823-831: The current check builds a set "fields" from keys only,
so entries like "node:" or "describe-name:" with empty values still pass; change
the parsing to build a dict (e.g., parse "content" into something like parsed =
{key.strip(): value.strip() for line in content.splitlines() if ":" in line for
key, _, value in [line.partition(':')]}) and then replace the membership check
with a validation that _REQUIRED_ARCHIVAL_FIELDS is a subset of parsed.keys()
and that all parsed[field] are non-empty (truthy) for each required field; keep
the existing check for _UNPROCESSED_ARCHIVAL_MARKER in content and only treat
the file as archival when the marker is absent and all required fields have
non-empty values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c784ffc9-48c4-48a0-af58-90fe2889bb07

📥 Commits

Reviewing files that changed from the base of the PR and between 23827f7 and efe230a.

📒 Files selected for processing (2)
  • src/fromager/sources.py
  • tests/test_sources.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_sources.py

@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch 2 times, most recently from 96ffb9a to a300456 Compare March 30, 2026 17:41
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_sources.py (1)

320-331: Add explicit assertions for node-date in generated/repaired archival files.

Current tests won’t catch regressions where node-date is missing from synthesized output. Add assertions in creation/replacement paths to verify the full archival contract.

Suggested test additions
         content = archival.read_text()
         assert "describe-name: 1.2.3\n" in content
         assert "node: " in content
+        assert "node-date: " in content
         assert "$Format:" not in content
@@
         content = archival.read_text()
         assert "describe-name: 4.5.6\n" in content
+        assert "node-date: " in content
         assert "$Format:" not in content
@@
         content = archival.read_text()
         assert "describe-name: 3.0.0\n" in content
+        assert "node-date: " in content
@@
         content = archival.read_text()
         assert "describe-name: 5.0.0\n" in content
+        assert "node-date: " in content

As per coding guidelines, "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."

Also applies to: 344-347, 371-374, 382-384

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_sources.py` around lines 320 - 331, Update the tests that validate
generated/rewritten .git_archival.txt (e.g., test_creates_file_when_missing and
the similar tests around lines 344-347, 371-374, 382-384) to assert that the
"node-date: " line is present and correctly populated in the file content;
locate the archival file read (variable archival or calls to
sources.ensure_git_archival) and add an assertion like `assert "node-date: " in
content` (and where applicable assert the date format or non-empty value) so
missing node-date regressions are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/sources.py`:
- Around line 793-799: Add "node-date" to the required archival fields set and
include a corresponding node-date line in the synthesized archival template;
update the symbols _REQUIRED_ARCHIVAL_FIELDS and _GIT_ARCHIVAL_CONTENT (and the
duplicate occurrence later in the file) so the archival content emits
"node-date: {node_date}" and the required-fields set contains "node-date" to
match the new .git_archival.txt contract.

---

Nitpick comments:
In `@tests/test_sources.py`:
- Around line 320-331: Update the tests that validate generated/rewritten
.git_archival.txt (e.g., test_creates_file_when_missing and the similar tests
around lines 344-347, 371-374, 382-384) to assert that the "node-date: " line is
present and correctly populated in the file content; locate the archival file
read (variable archival or calls to sources.ensure_git_archival) and add an
assertion like `assert "node-date: " in content` (and where applicable assert
the date format or non-empty value) so missing node-date regressions are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0eff024c-b129-4cc8-9b79-1a7ab4c18088

📥 Commits

Reviewing files that changed from the base of the PR and between 96ffb9a and a300456.

📒 Files selected for processing (2)
  • src/fromager/sources.py
  • tests/test_sources.py

@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch from a300456 to 173746e Compare March 30, 2026 20:06
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/sources.py`:
- Around line 798-828: ensure_git_archival currently returns None when
.git_archival.txt is missing; instead, create a synthetic archival file using
the provided version and return False to indicate a new file was written. In
ensure_git_archival(), when archival_file.is_file() is False, write a minimal
setuptools-scm archival payload to archival_file (for example include a
"describe-name: {version}" line using the Version value passed in) with
appropriate file permissions, flush/close it, and return False; keep the
existing behavior for cases where the file exists and only replace if
placeholders are present (preserve the existing logic in ensure_git_archival
that handles existing files).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1a1de89-1ee1-471e-8169-4f4299268a93

📥 Commits

Reviewing files that changed from the base of the PR and between a300456 and 173746e.

📒 Files selected for processing (2)
  • src/fromager/sources.py
  • tests/test_sources.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_sources.py

@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch from 173746e to 16a06e4 Compare March 30, 2026 21:05
@LalatenduMohanty
Copy link
Copy Markdown
Member Author

LalatenduMohanty commented Mar 30, 2026

@tiran Looks like my approach had a drawback and it was failing for the packages who needs a custom tag_regex, so we need to rethink about how we want to achieve this. I added a comment #961 (comment) for direction we should take. PTAL

@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch 2 times, most recently from 2bdd4c1 to 02e3d25 Compare March 30, 2026 21:28
@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch from 02e3d25 to 16e6406 Compare April 1, 2026 12:29
@LalatenduMohanty LalatenduMohanty marked this pull request as ready for review April 1, 2026 12:29
@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch from 16e6406 to 6254b53 Compare April 1, 2026 12:30
@LalatenduMohanty
Copy link
Copy Markdown
Member Author

LalatenduMohanty commented Apr 1, 2026

Traced through setuptools-scm's source to check the tag_regex risk with the PKG-INFO guard.

setuptools-scm uses a tag_regex setting to extract the version from git tag names. The default regex accepts bare versions like 1.2.3, but some projects define a custom regex that requires a prefix (e.g., setuptools-scm's own tags look like setuptools-scm-8.1.0). When describe-name doesn't match the regex, setuptools-scm raises an AssertionError.

The current code only creates .git_archival.txt when both PKG-INFO and .git are absent:

  • PyPI sdists → have PKG-INFO → skipped
  • Git clones → have .git → skipped
  • Non-PyPI tarballs without PKG-INFO → file created → could break if the project uses a custom tag_regex

That last case is narrow — most sources are either PyPI sdists or git clones. We also can't easily fix it since we'd need to know the project's tag prefix to synthesize a matching describe-name.

When building from an sdist that lacks .git metadata, setuptools-scm
  cannot determine the package version. This adds a .git_archival.txt
  file with the correct version so setuptools-scm can resolve it.

  - Skips packages with .git directories (no fix needed)
  - Replaces existing files that have unprocessed placeholders or
    missing fields
  - Creates a new file only when PKG-INFO is also absent (git clones
    or custom downloads, not PyPI sdists)

Closes: python-wheel-build#961

Co-Authored-By: Claude <claude@anthropic.com>
Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>
@LalatenduMohanty LalatenduMohanty force-pushed the 961_git-archival-for-setuptools-scm branch from 6254b53 to 4920bd6 Compare April 1, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate .git_archival.txt for setuptools-scm packages built from source archives

3 participants