Open
Conversation
MuneebUllahKhan222
requested changes
Apr 6, 2026
Contributor
MuneebUllahKhan222
left a comment
There was a problem hiding this comment.
Just need address couple of small changes.
pkg/sources/web/web.go
Outdated
| ctx.Logger().Error(err, "Visit failed") | ||
| } | ||
| collector.Wait() // blocks until all requests finish | ||
| close(done) |
Contributor
There was a problem hiding this comment.
it should be defer close(done) outside the go routine.
Contributor
Author
There was a problem hiding this comment.
Outside go routine?? Any reason why?
Contributor
There was a problem hiding this comment.
It is generally a practice that the owner of the channel should be the one to close it.
77a5cc2 to
aade78b
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 51abd9d. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description:
Adds a new
websource that crawls and scans websites for exposed secrets. The source uses theCollyframework to fetch pages starting from one or more seed URLs, with configurable crawl depth, per-domain request delay, and a per-URL timeout. Link following is opt-in via--crawl, robots.txt is respected by default, and linked JavaScript files are enqueued alongside HTML pages since they are a common location for hardcoded credentials. Each scanned page produces a chunk carrying the page title, URL, content type, crawl depth, and a UTC timestamp in the metadata.Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Introduces a new network-facing crawler source with configurable crawling/robots behavior and several new dependencies, which can affect runtime load, timeouts, and output volume.
Overview
Adds a new
webscan mode (trufflehog web) that fetches one or more seed URLs and optionally crawls in-scope links to produce scan chunks from HTTP responses.Implements the
websource using Colly with controls for crawl enablement, max depth, per-domain delay, overall timeout, custom User-Agent, and optionalrobots.txtbypass, and attaches newWebmetadata (URL/title/content-type/depth/timestamp) to emitted chunks.Updates protobufs (
sources.proto,source_metadata.proto) and generated code to includeSOURCE_TYPE_WEBplus corresponding config and metadata messages, and adds Prometheus metricweb_urls_scannedalong with a comprehensive test suite and documentation.Reviewed by Cursor Bugbot for commit 51abd9d. Bugbot is set up for automated code reviews on this repo. Configure here.