Skip to content

🐛Fix process Stop timeout error handling#3523

Merged
k8s-ci-robot merged 2 commits into
kubernetes-sigs:mainfrom
mattsu2020:fix_3522
Jun 17, 2026
Merged

🐛Fix process Stop timeout error handling#3523
k8s-ci-robot merged 2 commits into
kubernetes-sigs:mainfrom
mattsu2020:fix_3522

Conversation

@mattsu2020

Copy link
Copy Markdown
Contributor

What changed

  • Preserve the Stop timeout error when SIGKILL fails after StopTimeout expires.
  • Include the SIGKILL failure as supplementary error context with errors.Join.
  • Add a deterministic regression test for the timeout-plus-kill-failure path.

Why

On Darwin, SIGKILL can fail with EPERM if the process is already terminating after the timeout. That failure previously replaced the primary timeout error, making callers and tests see unable to kill process ... instead of timeout waiting for process ... to stop.

Fixes #3522.

Validation

go test ./pkg/internal/testing/process/
go test ./pkg/internal/testing/process/ -run TestStopReturnsTimeoutWhenKillAfterTimeoutFails -count=100

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 21, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @mattsu2020. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 21, 2026
@mattsu2020 mattsu2020 marked this pull request as ready for review May 21, 2026 23:01
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2026
@k8s-ci-robot k8s-ci-robot requested a review from vincepri May 21, 2026 23:01
@mattsu2020 mattsu2020 changed the title Fix process Stop timeout error handling 🐛Fix process Stop timeout error handling May 29, 2026
@mattsu2020

Copy link
Copy Markdown
Contributor Author

Hi @joelanford @troy0820 @vincepri,

This PR is ready for review. The PR title verifier is now passing, but golangci-lint is still waiting for maintainer approval because the PR has the needs-ok-to-test label.

Could one of you please take a look and run /ok-to-test if the patch looks reasonable?

Thanks!

@sbueringer

Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 13, 2026
@sbueringer

Copy link
Copy Markdown
Member

Needs rebase after #3519 is merged

@mattsu2020

Copy link
Copy Markdown
Contributor Author

Needs rebase after #3519 is merged

I have addressed it.

}

originalSignalProcess := signalProcess
defer func() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defers are not reliable in tests because the goroutine gets terminated on Fatal(), please use t.Cleanup instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. fix it

Replace deferred restoration of signalProcess with t.Cleanup to ensure cleanup runs even if the test panics or fails.
@alvaroaleman alvaroaleman added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 17, 2026
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, mattsu2020

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: d09e679c7d5d220f7a49c503feb0a043daaf8d22

@k8s-ci-robot k8s-ci-robot merged commit 2a830d5 into kubernetes-sigs:main Jun 17, 2026
11 checks passed
@mattsu2020

Copy link
Copy Markdown
Contributor Author

thanks

@mattsu2020 mattsu2020 deleted the fix_3522 branch June 17, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop() timeout error swallowed by SIGKILL EPERM on Darwin

4 participants