Skip to content

fix(python): handle pip options, hashes, and line continuations in requirements.txt#485

Merged
a-oren merged 2 commits into
guacsec:mainfrom
a-oren:TC-4527
May 21, 2026
Merged

fix(python): handle pip options, hashes, and line continuations in requirements.txt#485
a-oren merged 2 commits into
guacsec:mainfrom
a-oren:TC-4527

Conversation

@a-oren
Copy link
Copy Markdown
Contributor

@a-oren a-oren commented May 20, 2026

Summary

  • Adds preprocessRequirementsLines() to PythonControllerBase that properly handles all requirements.txt line types before dependency resolution
  • Fixes parsing errors when requirements.txt contains --extra-index-url, --hash, line continuations (\), and other non-package lines
  • Handles: pip options, inline options (--hash, --config-settings), line continuations, PEP 508 direct references (name @ url), bare URLs, VCS URLs, and local paths

Fixes TC-4527
Related: fabric8-analytics/fabric8-analytics-vscode-extension#843

Test plan

  • 13 unit tests covering all requirements.txt line types
  • Existing getIgnoredDependencies_strips_environment_markers test passes
  • Manual verification with requirements.txt files containing --extra-index-url and line continuations

🤖 Generated with Claude Code

Summary by Sourcery

Improve Python requirements.txt preprocessing to robustly handle pip options, hashes, line continuations, and non-package lines before dependency resolution.

Bug Fixes:

  • Fix handling of requirements.txt entries that include pip options, hashes, line continuations, direct references, bare URLs, VCS URLs, and local paths so they no longer break dependency parsing and installation.

Enhancements:

  • Introduce a reusable preprocessing routine for requirements.txt lines and apply it across Python dependency resolution and ignored-dependency detection to ensure consistent parsing behavior.

Tests:

  • Add unit tests covering preprocessing of all supported requirements.txt line variants, including pip options, hashes, line continuations, direct references, URLs, local paths, and mixed scenarios.

…quirements.txt (TC-4527)

Add preprocessRequirementsLines() to PythonControllerBase that properly handles
all requirements.txt line types: pip options (--extra-index-url, -r, -c, etc.),
inline options (--hash, --config-settings), line continuations (\), PEP 508
direct references (name @ url), bare URLs, VCS URLs, and local paths.

Applied in getDependenciesImpl, installingRequirementsOneByOne, and
getIgnoredDependencies to fix parsing errors when requirements.txt contains
non-package lines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 20, 2026

Reviewer's Guide

Adds a new preprocessing pipeline for requirements.txt lines in Python dependency handling, applying it consistently across requirements installation, dependency resolution, and ignored-dependency detection so pip options, hashes, URLs, and line continuations no longer break parsing.

File-Level Changes

Change Details Files
Introduce preprocessRequirementsLines to normalize and filter raw requirements.txt lines before dependency resolution.
  • Implement joining of lines with trailing backslashes (handling trailing whitespace) to reconstruct logical requirement entries.
  • Strip inline pip options such as --hash and --config-settings from requirement lines while preserving the base requirement.
  • Filter out non-package lines including pip option lines, comments, empty/whitespace-only lines, bare URLs/VCS URLs, and local path references.
  • Normalize PEP 508 direct references of the form "name @ url" to just the package name before further parsing.
src/main/java/io/github/guacsec/trustifyda/utils/PythonControllerBase.java
Apply requirements preprocessing in existing flows that read requirements.txt to ensure consistent handling.
  • Wrap Files.readAllLines(...) outputs with preprocessRequirementsLines in installingRequirementsOneByOne instead of doing only simple comment/empty-line filtering.
  • Replace manual filtering and trimming in getDependenciesImpl with a call to preprocessRequirementsLines so all requirements processing paths share the same behavior.
  • Update PythonPipProvider.getIgnoredDependencies to split manifest content on any line break and run it through preprocessRequirementsLines before ignore-pattern matching.
src/main/java/io/github/guacsec/trustifyda/utils/PythonControllerBase.java
src/main/java/io/github/guacsec/trustifyda/providers/PythonPipProvider.java
Add unit tests to fully cover the new preprocessing behavior for various requirements.txt patterns.
  • Add tests for filtering pip options like --extra-index-url, --index-url, -r, and -c lines.
  • Add tests for stripping inline --hash and --config-settings options, including when used across line continuations.
  • Add tests for handling comments, empty lines, line continuations (with and without trailing whitespace), PEP 508 direct references, bare URLs, VCS URLs, and local paths.
  • Add a combined-scenario test that mixes all supported constructs to assert end-to-end preprocessing output.
src/test/java/io/github/guacsec/trustifyda/utils/PythonControllerBaseTest.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The local path filtering in preprocessRequirementsLines() only handles Unix-style paths (./, ../, /), so Windows-style paths like C:\foo\bar or c:/foo/bar will currently be treated as normal requirements; consider extending the path detection logic to cover drive-letter paths as well.
  • preprocessRequirementsLines() currently treats any line containing :// as a non-package URL and drops it; if there are valid requirement syntaxes that might include :// in markers or extras, it may be safer to tighten this condition (e.g., only after stripping markers or matching known URL/VCS prefixes).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The local path filtering in preprocessRequirementsLines() only handles Unix-style paths (./, ../, /), so Windows-style paths like `C:\foo\bar` or `c:/foo/bar` will currently be treated as normal requirements; consider extending the path detection logic to cover drive-letter paths as well.
- preprocessRequirementsLines() currently treats any line containing `://` as a non-package URL and drops it; if there are valid requirement syntaxes that might include `://` in markers or extras, it may be safer to tighten this condition (e.g., only after stripping markers or matching known URL/VCS prefixes).

## Individual Comments

### Comment 1
<location path="src/main/java/io/github/guacsec/trustifyda/utils/PythonControllerBase.java" line_range="416-422" />
<code_context>
+      if (atIndex != -1) {
+        line = line.substring(0, atIndex).trim();
+      }
+      // Strip inline pip options (--hash=..., --config-settings=..., etc.)
+      int optionIndex = line.indexOf(" --");
+      if (optionIndex != -1) {
+        line = line.substring(0, optionIndex).trim();
+      }
+      // Filter out bare URLs and VCS URLs (any line containing :// is not a package name)
+      if (line.contains("://")) {
+        continue;
+      }
</code_context>
<issue_to_address>
**suggestion:** Inline option stripping based on " --" is brittle; a regex on whitespace+"--" would be more robust

This relies on `line.indexOf(" --")`, which only matches exactly one space before `--` and will miss cases like tabs or multiple spaces. Consider using a regex such as `Pattern.compile("\\s--")` to locate the first whitespace+`--` and cut the string there, making the option stripping robust to different spacing while still avoiding matches inside package names.

Suggested implementation:

```java
      // Strip PEP 508 direct references (name @ url -> name) before URL check
      int atIndex = line.indexOf(" @ ");
      if (atIndex != -1) {
        line = line.substring(0, atIndex).trim();
      }
      // Strip inline pip options (--hash=..., --config-settings=..., etc.) based on whitespace+"--"
      Pattern optionPattern = Pattern.compile("\\s--");
      Matcher optionMatcher = optionPattern.matcher(line);
      if (optionMatcher.find()) {
        line = line.substring(0, optionMatcher.start()).trim();
      }
      // Filter out bare URLs and VCS URLs (any line containing :// is not a package name)
      if (line.contains("://")) {
        continue;
      }

```

1. Add the necessary imports at the top of `PythonControllerBase.java` if they are not already present:
   - `import java.util.regex.Pattern;`
   - `import java.util.regex.Matcher;`
2. If this logic is used in a tight loop and performance is a concern, consider promoting `optionPattern` to a `private static final Pattern` field on the class and reusing it instead of compiling it on each call.
</issue_to_address>

### Comment 2
<location path="src/main/java/io/github/guacsec/trustifyda/providers/PythonPipProvider.java" line_range="98-100" />
<code_context>
   protected Set<PackageURL> getIgnoredDependencies(String manifestContent) {
-    String[] lines = manifestContent.split(System.lineSeparator());
-    return Arrays.stream(lines)
+    List<String> rawLines = Arrays.asList(manifestContent.split("\\R"));
+    List<String> preprocessed = PythonControllerBase.preprocessRequirementsLines(rawLines);
+    return preprocessed.stream()
         .filter(this::containsIgnorePattern)
         .map(PythonPipProvider::extractDepFull)
</code_context>
<issue_to_address>
**issue (bug_risk):** Preprocessing requirements before `containsIgnorePattern` may break comment-based ignore markers

`getIgnoredDependencies` used to pass raw manifest lines to `containsIgnorePattern`, but now it uses `preprocessRequirementsLines`, which drops comment-only lines and strips parts like ` @ url` and inline options. If ignore markers (e.g., `# trustify:ignore`) or other relevant tokens live in those stripped sections, they’ll no longer be recognized. Consider either running `containsIgnorePattern` on the raw lines before preprocessing, or adjusting `preprocessRequirementsLines` to retain any patterns that `containsIgnorePattern` depends on.
</issue_to_address>

### Comment 3
<location path="src/test/java/io/github/guacsec/trustifyda/utils/PythonControllerBaseTest.java" line_range="2047-2056" />
<code_context>
+    assertEquals(List.of("requests==2.28.0"), result);
+  }
+
+  @Test
+  void preprocessRequirementsLines_strips_direct_references() {
+    List<String> input =
+        List.of(
+            "pip @ https://github.com/pypa/pip/archive/22.0.2.zip",
+            "requests[security] @ https://github.com/psf/requests/archive/main.zip",
+            "flask==2.0.3");
+    List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
+    assertEquals(List.of("pip", "requests[security]", "flask==2.0.3"), result);
+  }
+
+  @Test
+  void preprocessRequirementsLines_handles_trailing_whitespace_after_backslash() {
+    List<String> input =
+        List.of("requests==2.28.0 \\   ", "    --hash=sha256:abc123", "flask==2.0.3");
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for a final line ending with a backslash without a following continuation line

Current tests cover valid line continuations (including trailing whitespace), but not a malformed case where the final line ends with `\` and has no continuation. Please add a test for this (e.g., `List.of("requests==2.28.0 \")`, and optionally with trailing whitespace) to clarify and lock in how we expect that last buffered line to be handled after the loop (dropped, passed through, or otherwise) and to guard against regressions.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/main/java/io/github/guacsec/trustifyda/utils/PythonControllerBase.java Outdated
Comment thread src/main/java/io/github/guacsec/trustifyda/providers/PythonPipProvider.java Outdated
- Revert getIgnoredDependencies to use raw lines so ignore markers
  (e.g. #trustify-da-ignore) in inline comments are not stripped
  before containsIgnorePattern runs; keep \R split fix
- Use compiled Pattern with \s-- for inline option stripping to handle
  tabs and multiple spaces, not just single space
- Add Windows drive-letter path filtering (C:\path, c:/path)
- Tighten :// URL check to only inspect the requirement part before
  markers, avoiding false positives from marker strings
- Add tests for Windows paths and final-line backslash edge case

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@a-oren
Copy link
Copy Markdown
Contributor Author

a-oren commented May 20, 2026

@sourcery-ai review

@a-oren a-oren requested a review from ruromero May 20, 2026 11:38
Copy link
Copy Markdown

@SourceryAI SourceryAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In preprocessRequirementsLines, the continuation-joining loop appends stripped text for continued lines but the original (unstripped) line for the final segment, which reintroduces trailing whitespace compared to the previous behavior where lines were trimmed; consider consistently using the stripped variant to avoid subtle whitespace-dependent issues.
  • The getIgnoredDependencies path still processes raw split lines (now via "\R") without using preprocessRequirementsLines, so ignore-pattern extraction may behave differently from dependency resolution for lines with pip options, hashes, or continuations; consider reusing the same preprocessing there for consistency.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In preprocessRequirementsLines, the continuation-joining loop appends stripped text for continued lines but the original (unstripped) line for the final segment, which reintroduces trailing whitespace compared to the previous behavior where lines were trimmed; consider consistently using the stripped variant to avoid subtle whitespace-dependent issues.
- The getIgnoredDependencies path still processes raw split lines (now via "\\R") without using preprocessRequirementsLines, so ignore-pattern extraction may behave differently from dependency resolution for lines with pip options, hashes, or continuations; consider reusing the same preprocessing there for consistency.

## Individual Comments

### Comment 1
<location path="src/test/java/io/github/guacsec/trustifyda/utils/PythonControllerBaseTest.java" line_range="2059-2068" />
<code_context>
             + "Required-by: cycler, gensim, gTTS, python-dateutil, tweepy\n");
   }
+
+  @Test
+  void preprocessRequirementsLines_filters_extra_index_url() {
+    List<String> input =
+        List.of(
+            "--extra-index-url https://pypi.example.com/simple",
+            "requests==2.28.0",
+            "--index-url https://pypi.org/simple",
+            "flask==2.0.3");
+    List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
+    assertEquals(List.of("requests==2.28.0", "flask==2.0.3"), result);
+  }
+
+  @Test
+  void preprocessRequirementsLines_filters_short_pip_options() {
+    List<String> input =
+        List.of("-r other-requirements.txt", "-c constraints.txt", "numpy==1.24.0");
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for pip option lines with leading whitespace

Since `preprocessRequirementsLines` trims each line before checking `line.startsWith("-")`, lines like `"   --extra-index-url ..."` or `"  -c constraints.txt"` are also treated as pip options and filtered out. Please add a test case with leading whitespace before these options to capture this behavior and protect against regressions in the trimming logic.

```suggestion
  @Test
  void preprocessRequirementsLines_filters_short_pip_options() {
    List<String> input =
        List.of("-r other-requirements.txt", "-c constraints.txt", "numpy==1.24.0");
    List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
    assertEquals(List.of("numpy==1.24.0"), result);
  }

  @Test
  void preprocessRequirementsLines_filters_pip_options_with_leading_whitespace() {
    List<String> input =
        List.of(
            "   --extra-index-url https://pypi.example.com/simple",
            "  -c constraints.txt",
            "requests==2.28.0");
    List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
    assertEquals(List.of("requests==2.28.0"), result);
  }

  @Test
  void preprocessRequirementsLines_strips_hashes() {
```
</issue_to_address>

Hi @a-oren! 👋

Thanks for trying out Sourcery by commenting with @sourcery-ai review! 🚀

Install the sourcery-ai bot to get automatic code reviews on every pull request ✨

Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Collaborator

@ruromero ruromero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a-oren a-oren merged commit 2a2083f into guacsec:main May 21, 2026
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants