Skip to content

flake: TestTools data race on shared err variable (SendTaskInput + GetTaskLogs) #1475

@johnstcn

Description

@johnstcn

🤖

CI Run Link: https://github.com/coder/coder/actions/runs/24588118104/job/71902734009?pr=24502
Workflow Job: test-go-race-pg
Commit: fb530033d5cacc2e24d2071e4362b66dcda3e7a5 (PR #24502 merge commit on top of a53f52ee38e3b5537048e3c227f7a1d9dda3a5ed)
When: 2026-04-17 21:56 UTC

What failed

  • Package: github.com/coder/coder/v2/codersdk/toolsdk
  • Tests (all marked race detected during execution of test):
    • TestTools/SendTaskInput (and subtests ByUUID, ByIdentifier, NoID, NoInput, NoTaskByID, NoTaskByWorkspaceIdentifier, ExistsButNotATask)
    • TestTools/GetTaskLogs (and subtests ByUUID, ByIdentifier, NoID, NoTaskByID, NoTaskByWorkspaceIdentifier)
    • TestTools/DeleteTask (collateral; parent TestTools is failed by the race)
    • TestTools/WorkspaceListApps (collateral)
    • TestTools/WorkspacePortForward (collateral)

Evidence from logs

==================
WARNING: DATA RACE
Write at 0x00c000b36660 by goroutine 4317:
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools.func30()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1468 +0x964
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools.func30()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1468 +0x947
  github.com/coder/coder/v2/coderd/database/dbfake.WorkspaceBuildBuilder.doInTX()
      /home/runner/work/coder/coder/coderd/database/dbfake/dbfake.go:463 +0x2d7a
  ...
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools.func30()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1448 +0x304

Previous write at 0x00c000b36660 by goroutine 4318:
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools.func31()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1608 +0xb64
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools.func31()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1608 +0xb47
  github.com/coder/coder/v2/coderd/database/dbfake.WorkspaceBuildBuilder.doInTX()
      /home/runner/work/coder/coder/coderd/database/dbfake/dbfake.go:463 +0x2d7a
  ...
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools.func31()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1588 +0x4f7

Goroutine 4317 (running) created at:
  testing.(*T).Run()
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1410 +0x213e  // t.Run("SendTaskInput", ...)

Goroutine 4318 (running) created at:
  testing.(*T).Run()
  github.com/coder/coder/v2/codersdk/toolsdk_test.TestTools()
      /home/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1547 +0x2370  // t.Run("GetTaskLogs", ...)

Classification

  • Type: Data race (true Go concurrency bug, not timing flake)
  • Not a matrix cancellation artifact
  • Deterministic given enough parallel interleaving — go test -race -count=N should reproduce
  • No panic/OOM signatures

Root cause analysis

The race is on the outer-scope err variable declared in TestTools at codersdk/toolsdk/toolsdk_test.go:155:

ws, err := client.Workspace(setupCtx, r.Workspace.ID)

Both TestTools/SendTaskInput (line 1409, starts with t.Parallel()) and TestTools/GetTaskLogs (line 1546, also t.Parallel()) assign to that outer err without re-declaring it:

  • toolsdk_test.go:1468err = store.UpdateWorkspaceAppHealthByID(...) (inside SendTaskInput subtest)
  • toolsdk_test.go:1608err = store.UpdateWorkspaceAppHealthByID(...) (inside GetTaskLogs subtest)

Because they use = rather than :=, both subtests write to the single outer err from parallel goroutines. The race detector captures the call stack at write time, which is why the trace passes through dbfake.WorkspaceBuildBuilder.doInTX at dbfake.go:463 (the call stack leading up to the err = ... assignment goes through the store.UpdateWorkspaceAppHealthByID → doInTX path); the actual race target is err, not any field inside WorkspaceBuildBuilder.

This is a test-code bug, not a dbfake bug.

Precise assignment analysis (blame)

git blame on both writing lines points to the same commit:

  • codersdk/toolsdk/toolsdk_test.go:1468db76541522 (Mathias Fredriksson, 2025-11-07)
  • codersdk/toolsdk/toolsdk_test.go:1608db76541522 (same commit)

Commit: db76541522"test(codersdk/toolsdk): fix task status race for send/logs (#20682)", which itself fixed #1111. That fix added the err = store.UpdateWorkspaceAppHealthByID(...) calls with = instead of :=.

Assigning to: @mafredri (introduced both writes; recognized toolsdk/TestTools owner per prior flakes #1111, #1103).

Proposed fix

In both SendTaskInput and GetTaskLogs subtests, change:

err = store.UpdateWorkspaceAppHealthByID(...)

to:

err := store.UpdateWorkspaceAppHealthByID(...)

so each subtest gets its own local err. Same treatment for any other err = ... assignments in parallel subtests that reference the outer scope. Worth an errcheck / go vet -copylocks pass on the file to catch other instances.

Reproduction hints

cd codersdk/toolsdk
go test -race -run TestTools -count=20 -v

Parallel scheduling of SendTaskInput and GetTaskLogs against the shared outer err should trigger the race reliably with enough iterations.

Related issues

Quality checklist

  • Full race trace captured from CI logs
  • Write sites identified on both goroutines
  • Blame points to single commit authoring both lines
  • Root cause traced to outer-scope err shadowing in parallel subtests, not dbfake
  • Searched coder/internal (open/closed) for duplicates: 0x00c000b36660, WorkspaceBuildBuilder race, TestTools data race, dbfake race — none found
  • Assignment based on blame, not CI run author

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions