Skip to content

fix(deployment): simplify probes command to reduce false positives#135

Open
radicand wants to merge 11 commits into
docker-mailserver:masterfrom
radicand:fix/probes
Open

fix(deployment): simplify probes command to reduce false positives#135
radicand wants to merge 11 commits into
docker-mailserver:masterfrom
radicand:fix/probes

Conversation

@radicand

@radicand radicand commented Sep 7, 2024

Copy link
Copy Markdown
Member

In some (rare - one documented in #134) instances, the deployment liveness/readiness probes may time out while running. Investigation into this seems to be that the supervisorctl status command returned the right information, but somewhere in the double grep chaining the command could hang.

The good news is there is no need for the double grepping. supervisorctl status will return an exit code of 0 when all services provided as arguments are in the RUNNING state, which is what we're intending to check anyway. If any services are not running, supervisorctl status returns a non-zero exit code which k8s will interpret as a probe failure.

This cleaner method is actually more reliable regardless, as it could be the case in the previous implementation that the probe would not fail if one of the key services was not running, but any one of the specified services was.

Finally, note that in this pull request I only included the baseline services that all installations use - there is an opportunity to conditionally include more services in the check based on what the user has chosen to enable in their values.yaml.

Comment thread charts/docker-mailserver/templates/deployment.yaml Outdated
radicand and others added 3 commits March 8, 2025 08:58
The double-grep behavior led to the command unintentionally hanging in some cases, prompting k8s to continually restart the pod without valid reason.
note: there is optional work remaining to conditionally add more services to liveness probe based on what the user has chosen to enable (or things that may perhaps always be runnig).
fix: update test snapshots
@radicand

Copy link
Copy Markdown
Member Author

Note the PR now has been adapted for the dms-healthcheck script, but until it lands the chart-testing steps will fail. Will revisit once a new release is cut to target against.

@cfis

cfis commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

This looks good to me. We can merge once a new version of docker-mailserver is released. @polarathene any thoughts on that?

@polarathene

Copy link
Copy Markdown
Member

@polarathene any thoughts on that?

Release is blocked on me finding time to review the Debian 13 + Dovecot 2.4 upgrade PR 😓

Unfortunately DMS is low on maintainers, and PRs like that one aren't as breezy to review confidently 😅 I'll try to fit some time towards resuming my review this week if I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants