fix(python): handle pip options, hashes, and line continuations in requirements.txt#485
Merged
Conversation
…quirements.txt (TC-4527) Add preprocessRequirementsLines() to PythonControllerBase that properly handles all requirements.txt line types: pip options (--extra-index-url, -r, -c, etc.), inline options (--hash, --config-settings), line continuations (\), PEP 508 direct references (name @ url), bare URLs, VCS URLs, and local paths. Applied in getDependenciesImpl, installingRequirementsOneByOne, and getIgnoredDependencies to fix parsing errors when requirements.txt contains non-package lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Reviewer's GuideAdds a new preprocessing pipeline for requirements.txt lines in Python dependency handling, applying it consistently across requirements installation, dependency resolution, and ignored-dependency detection so pip options, hashes, URLs, and line continuations no longer break parsing. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The local path filtering in preprocessRequirementsLines() only handles Unix-style paths (./, ../, /), so Windows-style paths like
C:\foo\barorc:/foo/barwill currently be treated as normal requirements; consider extending the path detection logic to cover drive-letter paths as well. - preprocessRequirementsLines() currently treats any line containing
://as a non-package URL and drops it; if there are valid requirement syntaxes that might include://in markers or extras, it may be safer to tighten this condition (e.g., only after stripping markers or matching known URL/VCS prefixes).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The local path filtering in preprocessRequirementsLines() only handles Unix-style paths (./, ../, /), so Windows-style paths like `C:\foo\bar` or `c:/foo/bar` will currently be treated as normal requirements; consider extending the path detection logic to cover drive-letter paths as well.
- preprocessRequirementsLines() currently treats any line containing `://` as a non-package URL and drops it; if there are valid requirement syntaxes that might include `://` in markers or extras, it may be safer to tighten this condition (e.g., only after stripping markers or matching known URL/VCS prefixes).
## Individual Comments
### Comment 1
<location path="src/main/java/io/github/guacsec/trustifyda/utils/PythonControllerBase.java" line_range="416-422" />
<code_context>
+ if (atIndex != -1) {
+ line = line.substring(0, atIndex).trim();
+ }
+ // Strip inline pip options (--hash=..., --config-settings=..., etc.)
+ int optionIndex = line.indexOf(" --");
+ if (optionIndex != -1) {
+ line = line.substring(0, optionIndex).trim();
+ }
+ // Filter out bare URLs and VCS URLs (any line containing :// is not a package name)
+ if (line.contains("://")) {
+ continue;
+ }
</code_context>
<issue_to_address>
**suggestion:** Inline option stripping based on " --" is brittle; a regex on whitespace+"--" would be more robust
This relies on `line.indexOf(" --")`, which only matches exactly one space before `--` and will miss cases like tabs or multiple spaces. Consider using a regex such as `Pattern.compile("\\s--")` to locate the first whitespace+`--` and cut the string there, making the option stripping robust to different spacing while still avoiding matches inside package names.
Suggested implementation:
```java
// Strip PEP 508 direct references (name @ url -> name) before URL check
int atIndex = line.indexOf(" @ ");
if (atIndex != -1) {
line = line.substring(0, atIndex).trim();
}
// Strip inline pip options (--hash=..., --config-settings=..., etc.) based on whitespace+"--"
Pattern optionPattern = Pattern.compile("\\s--");
Matcher optionMatcher = optionPattern.matcher(line);
if (optionMatcher.find()) {
line = line.substring(0, optionMatcher.start()).trim();
}
// Filter out bare URLs and VCS URLs (any line containing :// is not a package name)
if (line.contains("://")) {
continue;
}
```
1. Add the necessary imports at the top of `PythonControllerBase.java` if they are not already present:
- `import java.util.regex.Pattern;`
- `import java.util.regex.Matcher;`
2. If this logic is used in a tight loop and performance is a concern, consider promoting `optionPattern` to a `private static final Pattern` field on the class and reusing it instead of compiling it on each call.
</issue_to_address>
### Comment 2
<location path="src/main/java/io/github/guacsec/trustifyda/providers/PythonPipProvider.java" line_range="98-100" />
<code_context>
protected Set<PackageURL> getIgnoredDependencies(String manifestContent) {
- String[] lines = manifestContent.split(System.lineSeparator());
- return Arrays.stream(lines)
+ List<String> rawLines = Arrays.asList(manifestContent.split("\\R"));
+ List<String> preprocessed = PythonControllerBase.preprocessRequirementsLines(rawLines);
+ return preprocessed.stream()
.filter(this::containsIgnorePattern)
.map(PythonPipProvider::extractDepFull)
</code_context>
<issue_to_address>
**issue (bug_risk):** Preprocessing requirements before `containsIgnorePattern` may break comment-based ignore markers
`getIgnoredDependencies` used to pass raw manifest lines to `containsIgnorePattern`, but now it uses `preprocessRequirementsLines`, which drops comment-only lines and strips parts like ` @ url` and inline options. If ignore markers (e.g., `# trustify:ignore`) or other relevant tokens live in those stripped sections, they’ll no longer be recognized. Consider either running `containsIgnorePattern` on the raw lines before preprocessing, or adjusting `preprocessRequirementsLines` to retain any patterns that `containsIgnorePattern` depends on.
</issue_to_address>
### Comment 3
<location path="src/test/java/io/github/guacsec/trustifyda/utils/PythonControllerBaseTest.java" line_range="2047-2056" />
<code_context>
+ assertEquals(List.of("requests==2.28.0"), result);
+ }
+
+ @Test
+ void preprocessRequirementsLines_strips_direct_references() {
+ List<String> input =
+ List.of(
+ "pip @ https://github.com/pypa/pip/archive/22.0.2.zip",
+ "requests[security] @ https://github.com/psf/requests/archive/main.zip",
+ "flask==2.0.3");
+ List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
+ assertEquals(List.of("pip", "requests[security]", "flask==2.0.3"), result);
+ }
+
+ @Test
+ void preprocessRequirementsLines_handles_trailing_whitespace_after_backslash() {
+ List<String> input =
+ List.of("requests==2.28.0 \\ ", " --hash=sha256:abc123", "flask==2.0.3");
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for a final line ending with a backslash without a following continuation line
Current tests cover valid line continuations (including trailing whitespace), but not a malformed case where the final line ends with `\` and has no continuation. Please add a test for this (e.g., `List.of("requests==2.28.0 \")`, and optionally with trailing whitespace) to clarify and lock in how we expect that last buffered line to be handled after the loop (dropped, passed through, or otherwise) and to guard against regressions.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Revert getIgnoredDependencies to use raw lines so ignore markers (e.g. #trustify-da-ignore) in inline comments are not stripped before containsIgnorePattern runs; keep \R split fix - Use compiled Pattern with \s-- for inline option stripping to handle tabs and multiple spaces, not just single space - Add Windows drive-letter path filtering (C:\path, c:/path) - Tighten :// URL check to only inspect the requirement part before markers, avoiding false positives from marker strings - Add tests for Windows paths and final-line backslash edge case Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
|
@sourcery-ai review |
SourceryAI
reviewed
May 20, 2026
SourceryAI
left a comment
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In preprocessRequirementsLines, the continuation-joining loop appends stripped text for continued lines but the original (unstripped) line for the final segment, which reintroduces trailing whitespace compared to the previous behavior where lines were trimmed; consider consistently using the stripped variant to avoid subtle whitespace-dependent issues.
- The getIgnoredDependencies path still processes raw split lines (now via "\R") without using preprocessRequirementsLines, so ignore-pattern extraction may behave differently from dependency resolution for lines with pip options, hashes, or continuations; consider reusing the same preprocessing there for consistency.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In preprocessRequirementsLines, the continuation-joining loop appends stripped text for continued lines but the original (unstripped) line for the final segment, which reintroduces trailing whitespace compared to the previous behavior where lines were trimmed; consider consistently using the stripped variant to avoid subtle whitespace-dependent issues.
- The getIgnoredDependencies path still processes raw split lines (now via "\\R") without using preprocessRequirementsLines, so ignore-pattern extraction may behave differently from dependency resolution for lines with pip options, hashes, or continuations; consider reusing the same preprocessing there for consistency.
## Individual Comments
### Comment 1
<location path="src/test/java/io/github/guacsec/trustifyda/utils/PythonControllerBaseTest.java" line_range="2059-2068" />
<code_context>
+ "Required-by: cycler, gensim, gTTS, python-dateutil, tweepy\n");
}
+
+ @Test
+ void preprocessRequirementsLines_filters_extra_index_url() {
+ List<String> input =
+ List.of(
+ "--extra-index-url https://pypi.example.com/simple",
+ "requests==2.28.0",
+ "--index-url https://pypi.org/simple",
+ "flask==2.0.3");
+ List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
+ assertEquals(List.of("requests==2.28.0", "flask==2.0.3"), result);
+ }
+
+ @Test
+ void preprocessRequirementsLines_filters_short_pip_options() {
+ List<String> input =
+ List.of("-r other-requirements.txt", "-c constraints.txt", "numpy==1.24.0");
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for pip option lines with leading whitespace
Since `preprocessRequirementsLines` trims each line before checking `line.startsWith("-")`, lines like `" --extra-index-url ..."` or `" -c constraints.txt"` are also treated as pip options and filtered out. Please add a test case with leading whitespace before these options to capture this behavior and protect against regressions in the trimming logic.
```suggestion
@Test
void preprocessRequirementsLines_filters_short_pip_options() {
List<String> input =
List.of("-r other-requirements.txt", "-c constraints.txt", "numpy==1.24.0");
List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
assertEquals(List.of("numpy==1.24.0"), result);
}
@Test
void preprocessRequirementsLines_filters_pip_options_with_leading_whitespace() {
List<String> input =
List.of(
" --extra-index-url https://pypi.example.com/simple",
" -c constraints.txt",
"requests==2.28.0");
List<String> result = PythonControllerBase.preprocessRequirementsLines(input);
assertEquals(List.of("requests==2.28.0"), result);
}
@Test
void preprocessRequirementsLines_strips_hashes() {
```
</issue_to_address>Hi @a-oren! 👋
Thanks for trying out Sourcery by commenting with @sourcery-ai review! 🚀
Install the sourcery-ai bot to get automatic code reviews on every pull request ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
preprocessRequirementsLines()toPythonControllerBasethat properly handles all requirements.txt line types before dependency resolution--extra-index-url,--hash, line continuations (\), and other non-package linesname @ url), bare URLs, VCS URLs, and local pathsFixes TC-4527
Related: fabric8-analytics/fabric8-analytics-vscode-extension#843
Test plan
getIgnoredDependencies_strips_environment_markerstest passes--extra-index-urland line continuations🤖 Generated with Claude Code
Summary by Sourcery
Improve Python requirements.txt preprocessing to robustly handle pip options, hashes, line continuations, and non-package lines before dependency resolution.
Bug Fixes:
Enhancements:
Tests: