Skip to content

Fix a task fingerprinting bug#2740

Open
mkeeler wants to merge 1 commit intogo-task:mainfrom
mkeeler:watch-rebuild-single
Open

Fix a task fingerprinting bug#2740
mkeeler wants to merge 1 commit intogo-task:mainfrom
mkeeler:watch-rebuild-single

Conversation

@mkeeler
Copy link
Copy Markdown

@mkeeler mkeeler commented Mar 12, 2026

Task Fingerprinting Bug

The first commit in this PR fixes a bug where two task invocations (such as in a for loop) inadvertently where writing the checksum or timestamp files for the task to the same location even though the tasks were executed with different arguments causing them to have different sources.

Reproduction

version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"
  1. Run: echo 1 >1.in && echo 2 >2.in
  2. Run: task copy
    • This will run the copy:single once for each *.in file
  3. Run: echo 2.2 > 2.in
    • _This will run the copy:single task twice again with neither showing as up to date.

Because only 2.in was changed, I was expecting the task to show one copy:single task as up to date and then re-copy 2.in to 2.out.

Fix

Instead of writing out the checksum/timestamps to a single file within the respective directory, the task is first fingerprinted. So instead of the copy:single task here recording the checksum/timestamp in a single copy-single file, it will take a hash of the normalized task name, working directory of the task and the declared sources/generates and store the checksum in copy-single-<hash>. This allows each distinct invocation of the sub-task with different arguments to independently manage whether it is up to date.

Previously this PR also contained another fix but that has since been rolled into #2743

Task Watch Cancellation Bug

The pre-existing task watching code had a bug where once an event occurred it would
spawn go routines to process all tasks in the background and continue the loop. If an
event occurs, it would cancel the context used to run those previous go routines and > restart everything. In some scenarios this works fine such as when the generated
files do not reside within the same directory being watches. When the generated files
do reside in the same directory, the first task generating its output causes an
fsnotify event to be triggered which then cancels the context. This is racey, but if
the tasks are longer running it can eventually cancel the task resulting in other
sub-tasks not being executed. This doesn't result in an infinite loop because prior
to executing the task the fingerprint is checked and updated to prevent subsequent
runs.

The root cause of all of the bad behavior of not running the tasks to completion is
that the context is cancelled when it shouldn't be (an fsnotify event comes in for
something that is not one of the sources).

Reproduction

version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"
      # this is the main difference from the first bugs reproduction yaml
      # the sleep here ensures that tasks are "long running" allowing
      # time for the context cancellation to happen and prevent running
      # all the tasks
      - sleep 3
  1. Run: echo 1 >1.in && echo 2 >2.in
  2. Run: task -w copy
    • This will run the copy:single only once. It never gets around to executing
      the copy for the 2.in file
  3. In another terminal, run: echo 2.2 > 2.in
    • This will run the copy:single only once again.

I would have expected step 2 to run copy:single twice but it doesn't due to the
context being cancelled while in the first copy:single invocations sleep command
is executing.

I would also have expected step 3 cause copying to take place again. With the fix for
the fingerprinting bug included, the first invocation should show as up to date and
the second one would then run.

Fix

The fix was to move some logic to check the event against the sources out of the
spawned go routines to execute the tasks and to where the event handling first
starts. Because we check the events file against the list of sources before the
context is cancelled, we can toss out irrelevant events and keep processing of the
tasks going.~

@trulede trulede mentioned this pull request Mar 15, 2026
@butuzov butuzov mentioned this pull request Mar 15, 2026
@butuzov
Copy link
Copy Markdown
Contributor

butuzov commented Mar 15, 2026

This looks way better, than mine simple solution + test coverage added.

@mkeeler mkeeler force-pushed the watch-rebuild-single branch from c5d3920 to d19a73f Compare March 16, 2026 18:14
@mkeeler mkeeler changed the title Fix a task watching cancellation bug and a task fingerprinting bug Fix a task fingerprinting bug Mar 16, 2026
@mkeeler
Copy link
Copy Markdown
Author

mkeeler commented Mar 16, 2026

@trulede I have refactored this PR to only have the task fingerprinting fix.

Copy link
Copy Markdown
Contributor

@trulede trulede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a look at this PR there is another way to get an identity.
https://github.com/go-task/task/pull/2287/changes#diff-05ed798798e201a0fe6355799c9adb03b42b72710770a86ddecc4def33773c51R137

Its a bit more involved, but what I'm wondering is if the json.Marshal + xxh3 hash is have more (or less) impact on performance in comparison? I think here, mostly avoiding the JSON encoding, and directly hashing the fields necessary.

The other thing is the globbing of sources and generates. This is also an expensive operation. Can we do this fingerprinting after the task is compiled, and use the already globbed task.sources/generates, OR use this globbing result on the task to avoid a subsequent (and duplicate) globbing.

Note: there is another PR #2671 which introduces dynamic sources/generates. That probably also impacts on checksum operation (i.e. it would need to happen after the task is compiled.

Checks:

  • Can/are checksums done after the Task is compiled (support dynamic sources/generates).
  • Can the source/generates object on the task be used to avoid recalculation.
  • Performance of identity generation.

@mkeeler
Copy link
Copy Markdown
Author

mkeeler commented May 1, 2026

Regarding the performance I went ahead and tested out a version which uses just hashstructure:

func taskFingerprintKeyHashStructure(t *ast.Task) string {
	name := taskIdentityName(t)
	identity := fingerprintIdentity{
		Task:      name,
		Dir:       t.Dir,
		Sources:   globPatterns(t.Sources),
		Generates: globPatterns(t.Generates),
	}

	hash, err := hashstructure.Hash(identity, hashstructure.FormatV2, nil)
	if err != nil {
		return normalizeFilename(name)
	}

	return normalizeFilename(fmt.Sprintf("%s-%d", name, hash))
}

Its pretty much the exact same function with the hashing swapping out from json encode + xxh3 to use hashstructure.

In addition to just hashstructure I also tried out manually writing to an fnv hasher with the individual fields. The thinking here was that maybe avoiding reflection could be a big performance win.

func taskFingerprintKeyFnv(t *ast.Task) string {
	name := taskIdentityName(t)

	fnvHash := fnv.New64a()
	_, err := fnvHash.Write([]byte(name))
	if err != nil {
		return normalizeFilename(name)
	}

	_, err = fnvHash.Write([]byte(t.Dir))
	if err != nil {
		return normalizeFilename(name)
	}

	if err := hashGlobs(t.Sources, fnvHash); err != nil {
		return normalizeFilename(name)
	}

	if err := hashGlobs(t.Generates, fnvHash); err != nil {
		return normalizeFilename(name)
	}

	return normalizeFilename(fmt.Sprintf("%s-%x", name, fnvHash.Sum64()))
}

func hashGlobs(globs []*ast.Glob, h io.Writer) error {
	if len(globs) == 0 {
		return nil
	}

	for _, glob := range globs {
		if glob == nil {
			continue
		}

		if glob.Negate {
			if _, err := h.Write([]byte("!")); err != nil {
				return err
			}
		}
		if _, err := h.Write([]byte(glob.Glob)); err != nil {
			return err
		}
	}
	return nil
}

Then I have this benchmark test:

func BenchmarkTaskFingerprinting(b *testing.B) {
	task := &ast.Task{
		Task:     "namespace:copy/single",
		FullName: "namespace:copy/single",
		Dir:      "repro",
		Sources: []*ast.Glob{
			{Glob: "**/*.in"},
			{Glob: "vendor/**", Negate: true},
			{Glob: "**/*.tmpl"},
		},
		Generates: []*ast.Glob{
			{Glob: "**/*.out"},
			{Glob: "reports/**/*.json"},
		},
	}

	prefix := normalizeFilename(task.Task) + "-"

	b.ReportAllocs()
	b.ResetTimer()

	type benchmarkCase struct {
		name string
		fn   func(task *ast.Task) string
	}

	cases := []benchmarkCase{
		{name: "json+xxh3", fn: taskFingerprintKey},
		{name: "hashstructure", fn: taskFingerprintKeyHashStructure},
		{name: "fnv", fn: taskFingerprintKeyFnv},
	}

	for _, bc := range cases {
		b.Run(bc.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				benchmarkTaskFingerprintKeySink = bc.fn(task)
				if !strings.HasPrefix(benchmarkTaskFingerprintKeySink, prefix) {
					b.FailNow()
				}
			}
		})
	}
}

The results show hashstructure taking ~2x the CPU time as the json+xxh3 thats committed in this PR. The manual fnv hashing did improve on the json+xxh3 for speed (165 nanoseconds per operation reduction).

➜  go test -benchtime=5s -bench .
goos: darwin
goarch: arm64
pkg: github.com/go-task/task/v3/internal/fingerprint
cpu: Apple M3 Pro
BenchmarkTaskFingerprinting/json+xxh3-12                 5948370               982.8 ns/op
BenchmarkTaskFingerprinting/hashstructure-12             3277699              1845 ns/op
BenchmarkTaskFingerprinting/fnv-12                       7306032               817.2 ns/op
PASS
ok      github.com/go-task/task/v3/internal/fingerprint 21.941s

Obviously with more sources/generates the time to create the fingerprint will increase but I wouldn't expect it to exceed low single digit milliseconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants