Skip to content

feat: add custom walker support#698

Open
tobim wants to merge 7 commits into
numtide:mainfrom
tobim:push-spzzmzrpkzzq
Open

feat: add custom walker support#698
tobim wants to merge 7 commits into
numtide:mainfrom
tobim:push-spzzmzrpkzzq

Conversation

@tobim
Copy link
Copy Markdown

@tobim tobim commented May 15, 2026

As the title says.

Example: Format a git repo + a specific submodule:

walk = "repoPlusSubmodule"

[walker.repoPlusSubmodule]
command = "bash"
options = [
  "-c",
  '''
    submodule="path/to/submodule"
    {
      git ls-files --cached --others --exclude-standard --full-name
      git -C "$submodule" ls-files --cached --others --exclude-standard --full-name \
        | sed "s#^#$submodule/#"
    }
  ''',
]

Example: Format a pijul repo:

walk = "pijul"

[walker.pijul]
command = "pijul"
options = ["ls"]

tobim added 5 commits May 15, 2026 08:15
Add documentation for selecting a custom walker with the global
walk option and defining its command and options under [walker.<name>].
Document that custom walker commands run from the tree root, emit one
relative path per line, and do not need to handle path arguments.
Parse [walker.<name>] tables into the configuration model and validate
walker names, commands, and walk values. Accept documented camel-case
walker names even when Viper normalizes table keys.
Introduce a command-backed walker reader that runs from the tree root,
reads one path per stdout line, filters configured subpaths inside
treefmt, and converts accepted paths into walk.File values. Route custom
walk names through the format command without changing built-in walkers.
Cover selecting a custom walker from treefmt.toml, passing configured
walker options, and filtering walker output when treefmt is invoked with
a directory path argument.
Use bash -c for custom walker test commands instead of temporary scripts
with /usr/bin/env shebangs, so the tests also run inside the Nix build
sandbox.
tobim added 2 commits May 15, 2026 08:56
Make the custom walker own its stdout and stderr pipes instead of
combining StdoutPipe with a concurrent Wait call. This avoids surfacing
a spurious file-closed read error at process exit and preserves command
failures for Close().
Refactor custom walker configuration lookup to satisfy nesting limits,
remove an unused gosec suppression, and adjust test spacing to satisfy
golangci-lint in the Nix check derivation.
Copy link
Copy Markdown
Collaborator

@jfly jfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobim, thanks for the contribution! I am supportive of this change, but haven't had a chance to read the code yet.

I wanted to warn you that #694 touches some of the same code you've touched here. It may land before this PR, apologies if the conflicts are annoying to deal with.

Copy link
Copy Markdown
Collaborator

@jfly jfly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks like a solid approach.

Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?

Accept documented camel-case walker names even when Viper normalizes table keys

What is camel-case? That's kebab-case. This is camelCase (or CamelCase).

Fix custom walker pipe handling

I did not have time to grok this commit during my review. I'll need some more handholding before I can approve it. Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").

Comment on lines +466 to +468
When you pass directory paths to `treefmt`, the walker command still runs for the tree root.
`treefmt` filters the command output to the requested directories.
The walker command doesn't need to implement path argument handling.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate the simplicity of this decision, I'm not sure it's the right decision. I imagine people are most inclined to use the "format a directory" feature in very large repos, where the time spent for some VCS to determine which files are in the entire repo might be a lot slower than discovering just the files in a specific directory. #694 is an example of someone reporting just how slow it is for git to report all files in a repo.

How crazy would it be for the walk command to have to implement path argument handling? IIUC, it would be straightforward for git.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that I didn't pay too much attention to optimizing for subtrees. It's no concern for my own use cases.

For a regular git walker this might not be a problem, but as soon as you want a more custom file list - like in the first example from the PR description - implementing the filtering yourself becomes error-prone rather quickly.

Maybe we'll add a toggle so the user can switch between both methods?

Comment thread cmd/init/init.toml
# You can also set this to the name of a configured custom walker
# Env $TREEFMT_WALK
# walk = "filesystem"
# walk = "myWalker"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should instead do walker.myWalker (or perhaps custom-walker.myWalker) here. Then we wouldn't have to worry about (or detect) conflicts with the builtin walkers.

Comment thread config/config.go Outdated
Comment on lines +274 to +281
for name, walkerCfg := range cfg.WalkerConfigs {
if strings.EqualFold(name, cfg.Walk) {
cfg.WalkerConfigs[cfg.Walk] = walkerCfg
ok = true

break
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really worth it to be case insensitive? Are we case insensitive for the builtin walkers?

Comment thread config/config.go
Comment on lines +211 to +213
if cfg.Walk == "" {
cfg.Walk = walk.Auto.String()
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Don't we default to auto here?

Comment thread walk/custom.go
return nil, false, nil
}

relPath, err := c.relativePath(line)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the docs you wrote said that the custom walker must emit relative paths. Reading this code, it looks like it's OK if it emits absolute paths (as we'll convert them to relative here). Am I reading this correctly? Want to update the docs accordingly?

Comment thread walk/walk.go
reader, err = NewReader(Git, root, path, db, statz)
if err != nil {
reader, err = NewReader(Jujutsu, root, path, db, statz)
if selector.Custom != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff would be a lot simpler, and the if selector.Custom == nil && selector.Type == Stdin change below wouldn't have had to happen if we introduced a new selector.Type == Custom. Did you consider that?

Comment thread cmd/root_test.go
)

treefmt(t,
withArgs("-c", "go"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the long form in tests:

		withArgs("--clear-cache", "go"),

@tobim
Copy link
Copy Markdown
Author

tobim commented May 27, 2026

Thank you for the review. I hope to be able to address the individual comments in the next few days.

Not a blocker, but I'm not in love with a 3rd copy of the walk code. I didn't read it particularly closely. I think 2 copies (jj and git) is OK. 3 is when it probably makes sense to to think of an abstraction. Did you think about that at all?

I didn't really consider that because I wanted to keep the new path somewhat isolated.
We could also replace git and jj with built in custom walkers now that the machinery is there.

Does whatever bug you're fixing here apply to the git and jj walkers as well? Is it related to this bug @Mic92 found in #694 (commit labeled "walk: unblock producers when Close() is called before EOF").

I'll have to look into that.

@jfly
Copy link
Copy Markdown
Collaborator

jfly commented May 27, 2026

I didn't really consider that because I wanted to keep the new path somewhat isolated.
We could also replace git and jj with built in custom walkers now that the machinery is there.

This sounds reasonable to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants