cert-manager infinite retries without backoff

### What happened?

When using expired credentials or having other misconfigurations in `stackit-cert-manager-webhook` cert-manager will infinitely retry (without any exponential backoff!) the `Challenge` object.
This happens in all error cases where the returned error contains any changing details (e.g., timestamp, request ID, ...), as the error returned by the webhook is persisted in the `Challenge` object (`status.reason`) and cert-manager reconciles the entire `Challenge` object if it detects any change (including the entire `.status`!).

This is also stated [here](https://github.com/cert-manager/cert-manager/blob/31bc9ae292377a78ebcc6dc1daae12c1bdb8eba7/pkg/controller/acmechallenges/sync.go#L220) in a comment inside the `acmechallenges` `Sync` function. Sadly, this entire thing isn't properly documented anywhere else 😔

### How can we reproduce this?

We've noticed this when someone tried to use a removed service account key, so:

1. Create new project
2. Create a new service account (no need to actually add it to a project)
3. Create a new service account key (persist it for later use and delete it again)
4. Deploy cert-manager
5. Deploy `stackit-cert-manager-webhook` (`helm install stackit-cert-manager-webhook -n cert-manager stackit-cert-manager-webhook/stackit-cert-manager-webhook --set stackitSaAuthentication.enabled=true` and create the `cert-manager/stackit-sa-authentication` secret) 
6. Create an issuer and certificate

Observe the issue:
1. Check the events of the `Challenge` resource
2. `kubectl get challenges.acme.cert-manager.io -w` (see ~4 changes per second)
3. Check the cert-manager logs
4. Check the `stackit-cert-manager-webhook` logs

### Additional context

To properly fix this, we must sanitize every error case where we don't have control of the error. We can still log the "original" error, so we should just state the general thing that failed and optionally tell the user that they should check the `stackit-cert-manager-webhook` logs (e.g., "failed fetching zone. See the stackit-cert-manager-webhook logs for more details.").

### Search

- [x] I did search for other open and closed issues before opening this.

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cert-manager infinite retries without backoff #112

What happened?

How can we reproduce this?

Additional context

Search

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

cert-manager infinite retries without backoff #112

Description

What happened?

How can we reproduce this?

Additional context

Search

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions