feat: add custom walker support#698
Conversation
Add documentation for selecting a custom walker with the global walk option and defining its command and options under [walker.<name>]. Document that custom walker commands run from the tree root, emit one relative path per line, and do not need to handle path arguments.
Parse [walker.<name>] tables into the configuration model and validate walker names, commands, and walk values. Accept documented camel-case walker names even when Viper normalizes table keys.
Introduce a command-backed walker reader that runs from the tree root, reads one path per stdout line, filters configured subpaths inside treefmt, and converts accepted paths into walk.File values. Route custom walk names through the format command without changing built-in walkers.
Cover selecting a custom walker from treefmt.toml, passing configured walker options, and filtering walker output when treefmt is invoked with a directory path argument.
Use bash -c for custom walker test commands instead of temporary scripts with /usr/bin/env shebangs, so the tests also run inside the Nix build sandbox.
Make the custom walker own its stdout and stderr pipes instead of combining StdoutPipe with a concurrent Wait call. This avoids surfacing a spurious file-closed read error at process exit and preserves command failures for Close().
Refactor custom walker configuration lookup to satisfy nesting limits, remove an unused gosec suppression, and adjust test spacing to satisfy golangci-lint in the Nix check derivation.
jfly
left a comment
There was a problem hiding this comment.
Thanks! This looks like a solid approach.
Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?
Accept documented camel-case walker names even when Viper normalizes table keys
What is camel-case? That's kebab-case. This is camelCase (or CamelCase).
Fix custom walker pipe handling
I did not have time to grok this commit during my review. I'll need some more handholding before I can approve it. Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").
| When you pass directory paths to `treefmt`, the walker command still runs for the tree root. | ||
| `treefmt` filters the command output to the requested directories. | ||
| The walker command doesn't need to implement path argument handling. |
There was a problem hiding this comment.
While I appreciate the simplicity of this decision, I'm not sure it's the right decision. I imagine people are most inclined to use the "format a directory" feature in very large repos, where the time spent for some VCS to determine which files are in the entire repo might be a lot slower than discovering just the files in a specific directory. #694 is an example of someone reporting just how slow it is for git to report all files in a repo.
How crazy would it be for the walk command to have to implement path argument handling? IIUC, it would be straightforward for git.
There was a problem hiding this comment.
I admit that I didn't pay too much attention to optimizing for subtrees. It's no concern for my own use cases.
For a regular git walker this might not be a problem, but as soon as you want a more custom file list - like in the first example from the PR description - implementing the filtering yourself becomes error-prone rather quickly.
Maybe we'll add a toggle so the user can switch between both methods?
| # You can also set this to the name of a configured custom walker | ||
| # Env $TREEFMT_WALK | ||
| # walk = "filesystem" | ||
| # walk = "myWalker" |
There was a problem hiding this comment.
I wonder if we should instead do walker.myWalker (or perhaps custom-walker.myWalker) here. Then we wouldn't have to worry about (or detect) conflicts with the builtin walkers.
| for name, walkerCfg := range cfg.WalkerConfigs { | ||
| if strings.EqualFold(name, cfg.Walk) { | ||
| cfg.WalkerConfigs[cfg.Walk] = walkerCfg | ||
| ok = true | ||
|
|
||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
Is this really worth it to be case insensitive? Are we case insensitive for the builtin walkers?
| if cfg.Walk == "" { | ||
| cfg.Walk = walk.Auto.String() | ||
| } |
There was a problem hiding this comment.
Why do we need this? Don't we default to auto here?
| return nil, false, nil | ||
| } | ||
|
|
||
| relPath, err := c.relativePath(line) |
There was a problem hiding this comment.
I believe the docs you wrote said that the custom walker must emit relative paths. Reading this code, it looks like it's OK if it emits absolute paths (as we'll convert them to relative here). Am I reading this correctly? Want to update the docs accordingly?
| reader, err = NewReader(Git, root, path, db, statz) | ||
| if err != nil { | ||
| reader, err = NewReader(Jujutsu, root, path, db, statz) | ||
| if selector.Custom != nil { |
There was a problem hiding this comment.
This diff would be a lot simpler, and the if selector.Custom == nil && selector.Type == Stdin change below wouldn't have had to happen if we introduced a new selector.Type == Custom. Did you consider that?
| ) | ||
|
|
||
| treefmt(t, | ||
| withArgs("-c", "go"), |
There was a problem hiding this comment.
Please use the long form in tests:
withArgs("--clear-cache", "go"),
|
Thank you for the review. I hope to be able to address the individual comments in the next few days.
I didn't really consider that because I wanted to keep the new path somewhat isolated.
I'll have to look into that. |
This sounds reasonable to me |
As the title says.
Example: Format a git repo + a specific submodule:
Example: Format a pijul repo: