Skip to content

Fix missing strip_root behavior in Codebase return path#4711

Open
Kaushik-Kumar-CEG wants to merge 1 commit intoaboutcode-org:developfrom
Kaushik-Kumar-CEG:fix-strip-root-api-issue-2985
Open

Fix missing strip_root behavior in Codebase return path#4711
Kaushik-Kumar-CEG wants to merge 1 commit intoaboutcode-org:developfrom
Kaushik-Kumar-CEG:fix-strip-root-api-issue-2985

Conversation

@Kaushik-Kumar-CEG
Copy link
Copy Markdown

@Kaushik-Kumar-CEG Kaushik-Kumar-CEG commented Jan 29, 2026

Fixes #2985

Summary

Fixed strip_root parameter not stripping root directory from Resource paths when using cli.run_scan() programmatically.

Problem

  • CLI --strip-root worked correctly
  • but, API mode with return_codebase=True did not strip root from Resource paths
  • this is the API mode that returns a Codebase object instead of JSON

Solution

  • rebuilt resources_by_path with stripped paths after Codebase creation
  • used strip_first_path_segment() to remove root prefix from each Resource
  • fixed Resource.parent() method to return root Resource (not empty string) for direct children of root by monkey-patching when strip_root is enabled

Changes

  • src/scancode/cli.py: Added path stripping logic and parent() fix for return_codebase branch

Verification

  • all existing tests pass
  • manual testing confirms root directory is now stripped from paths
  • Resource.parent(codebase) correctly returns root Resource for direct children

Screenshots of Fix

BEFORE fix:
Screenshot 2026-01-29 212317

AFTER Fix
Screenshot 2026-01-29 211836

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

@Kaushik-Kumar-CEG
Copy link
Copy Markdown
Author

Kaushik-Kumar-CEG commented Jan 31, 2026

@JonoYang This PR is ready for review

The issue was that strip_root=True had no effect when using return_codebase=True because commoncode's Codebase ignores the parameter. The stripping only worked during JSON serialization, not on the actual Resource objects.

The fix rebuilds resources_by_path with stripped paths after Codebase creation. Additionally, it patches Resource.parent() because commoncode's implementation uses return parent_path and get_resource(parent_path) which returns early on empty strings when a direct child's parent path becomes '' after stripping.

Kindly review my fix and let me know if you have any feedback :)

@Kaushik-Kumar-CEG
Copy link
Copy Markdown
Author

@JonoYang
the issue #2985 you stated, is fixed
please let me know if there are any changes :)

@Kaushik-Kumar-CEG
Copy link
Copy Markdown
Author

@JonoYang
a gentle reminder
would appreciate your feedback on this :)

Copy link
Copy Markdown
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kaushik-Kumar-CEG code is not mergable/maintainable without tests.

Comment thread src/scancode/cli.py
codebase.resource_class.parent = patched_parent

finally:
# remove temporary files
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kaushik-Kumar-CEG I see no tests added at all, how would I know your code is working or not without doing all the research myself?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've added them from the start. Added 4 tests in test_cli.py that cover strip_root with return_codebase — including the parent() fix this comment is on

Comment thread src/scancode/cli.py Outdated
new_resources_by_path[stripped_path] = resource
codebase.resources_by_path = new_resources_by_path

# Fix parent() to handle empty parent_path for direct children of root.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are too descriptive, you can keep a short summary, and have extra comments here in github as PR comments,

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, trimmed them down to one-liners.

…ase mode

When using cli.run_scan() with strip_root=True and return_codebase=True,
the root directory was not stripped from Resource paths. This adds path
stripping logic for the return_codebase branch and patches parent()
for direct children of root. Adds tests for stripped paths, single file
behavior, parent traversal, and regression guard.

Fixes: aboutcode-org#2985
Signed-off-by: Kaushik <kaushikrjpm10@gmail.com>
@Kaushik-Kumar-CEG Kaushik-Kumar-CEG force-pushed the fix-strip-root-api-issue-2985 branch from 956183e to a830f08 Compare April 21, 2026 15:53
@Kaushik-Kumar-CEG
Copy link
Copy Markdown
Author

@AyanSinhaMahapatra updated the PR based on your review

Added 4 tests for strip_root + return_codebase (path stripping, single file, parent traversal, regression guard)
Trimmed the inline comments to short summaries
ready for review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting strip_root=True when calling cli.run_scan() does not strip the root from Resource paths

2 participants