fix(config): watch vector config paths for sinks and transforms#25133
fix(config): watch vector config paths for sinks and transforms#25133powerumc wants to merge 3 commits intovectordotdev:masterfrom
Conversation
* Append vector `config_paths` to the list of watched files for transforms and sinks. * Enhance `ComponentConfig::contains` to correctly match modified files against watched directory paths (e.g., when using `--config-dir`).
d3bf255 to
d089638
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d089638b6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| && config_paths | ||
| .iter() | ||
| .filter(|p| Format::from_path(p).is_ok()) | ||
| .any(|p| p.parent() == Some(path.as_path())) |
There was a problem hiding this comment.
Match config-dir updates by ancestry, not immediate parent
In ComponentConfig::contains, directory-based config paths only match when p.parent() == path, so changes under nested files like --config-dir .../transforms/foo/bar.toml are not attributed to sink/transform components. Because load_from_dir supports component subdirectories (and recursive transform subdirectories), a concurrent enrichment-table edit can still produce changed_components containing only enrichment tables, which triggers ReloadEnrichmentTables and skips reloading the changed Vector config.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Before this PR, Vector reloads even when only enrichment table files are modified within subdirectories (including nested subdirectories) of the config directory.
For example, given the following directory structure, modifying only the .csv files still triggers a reload:
test -- example.toml
-- example1.csv
-- example2.csv
-- sink -- example2.toml
$ echo '' >> example1.csv && echo '' >> example2.csv
INFO vector::config::watcher: Configuration file changed.
INFO vector::topology::running: Reloading running topology with new configuration.
INFO vector::topology::running: Running healthchecks.
INFO vector::topology::running: New configuration loaded successfully.
INFO vector: Vector has reloaded. path=[Dir("test")] internal_log_rate_limit=false
I’m not sure whether this is intended behavior in Vector or not.
If this is intended, then changes in subdirectories (including nested ones) may not need to be checked in contains().
There was a problem hiding this comment.
IMO it's worth addressing this finding by doing the following:
- Add a small helper in
src/config/watcher.rsthat checks whether a changed_path either exactly matches a watched config file or is a recognized config file under a watched config dir. - Call it before the "Only enrichment tables have changed branch"
And for testing coverage:
- A test involving a
transforms/foo/bar.tomlwhich is changed in the same window as an enrichment-table file and assert that we get aReloadFromDisksignal. - A test showing enrichment-only changes still use
ReloadEnrichmentTables.
For simplicity, I would defer discussion on whether config-file changes under --config-dir should go straight to ReloadFromDisk instead of piggybacking on component attribution.
There was a problem hiding this comment.
In my opinion, instead of adding a new helper in watcher.rs, it might be sufficient to adjust the condition in the existing ComponentConfig.contains() function as follows:
//.any(|p| p.parent() == Some(path.as_path()))
.any(|p| p.starts_with(path))(I haven’t tested this yet.)
This way, we would only need to add tests for the contains() function.
There was a problem hiding this comment.
Let's take a step back and look at the broader approach. Bear with me please, since this is a complex area due to the number of possible scenarios (especially when a config dir contains multiple configs but let's stick to a single config file for now.). That's why my first hunch was to fix the watcher design.
Bug in current implementation
Walkthrough the provided reproduction steps starting from:
echo '' >> example.toml && echo '' >> example1.csv && echo '' >> example2.csvThis appends a newline to all three files, but example.toml's parsed config is unchanged - zero components changed.
After the PR, the log shows:
Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "console1"
}, ComponentKey { id: "console2" }, ComponentKey { id: "example2_csv" }]
configuration changed.
But console1 and console2 didn't actually change. This is a bug.
The current master behavior of sending ReloadEnrichmentTables is actually
correct here - only enrichment data files changed, and the config didn't actually change.
Next steps
Change the trigger to:
sed -i '' 's/console2/console3/' test/example.toml && echo '' >> test/example1.csvDoes Vector reload the full config (picking up the codec change) or (bug) it only sends ReloadEnrichmentTables and the sink change is lost?
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d089638b6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
Fixed an issue where concurrent modifications to Vector configuration files and enrichment tables resulted in only enrichment table changes being reloaded.
When
ComponentConfig.component_typeisSinkorTransform,config_pathsis empty. As a result, simultaneous changes to Vector config files and enrichment tables are not fully detected, and only enrichment table changes are picked up.This PR fixes that by adding the Vector configuration file (or directory) to
config_paths, and by checking whether the changed Vector configuration path lies under a directory listed inconfig_paths.Vector configuration
For testing, add a Vector configuration file
example.tomlunder thetestdirectory. This configuration is a simple setup that printsHello, Vector!every 2 seconds.In the same
testdirectory, create two Enrichment Table (.csv) files:example1.csvandexample2.csv.How did you test this PR?
Run Vector:
In another terminal session, from the test directory, change the Vector configuration file and the Enrichment Table (.csv) files at the same time:
You would expect a reload because the Vector configuration changed, but only the enrichment table files reload:
After this PR, the Vector configuration reloads as expected. (When the Vector configuration reloads, enrichment tables are loaded again as well.)
When only the enrichment tables change, only the enrichment tables reload, as expected:
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.