Skip to content

feat(telemetry): log and report lock-acquisition contention at info #152

Open
swarit-stepsecurity wants to merge 1 commit into
step-security:mainfrom
swarit-stepsecurity:swarit/feat/wt/lock-acquire-logging
Open

feat(telemetry): log and report lock-acquisition contention at info #152
swarit-stepsecurity wants to merge 1 commit into
step-security:mainfrom
swarit-stepsecurity:swarit/feat/wt/lock-acquire-logging

Conversation

@swarit-stepsecurity

Copy link
Copy Markdown
Member

No description provided.

…evel

Surface lock-acquisition failures (another instance already running) at info
level instead of Debug so the contention is visible in agent.log, and report
the failed run immediately at the failure site so the backend records that a
second invocation contended for the lock. reportFailedOnce is idempotent, so
the deferred handler firing on the error return is a no-op.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make lock-acquisition contention visible in normal (info-level) logs and ensure the backend records a failed run immediately when the agent can’t acquire its instance lock.

Changes:

  • Promotes lock-acquisition failure logging from debug to info (log.Progress) so it appears in agent.log.
  • Triggers an immediate reportFailedOnce(...) on lock-acquisition failure (in addition to the deferred error handler, which becomes a no-op due to idempotency).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// backend records that this invocation contended for the lock while one
// was already in flight. reportFailedOnce is idempotent, so the deferred
// handler that also fires on the error return is a no-op.
log.Progress("Lock acquisition failed (PID %d): %v — another instance is already running, exiting", os.Getpid(), err)
Comment on lines +366 to +371
// Another instance already holds the lock. Surface at info level (not
// Debug) so the contention is visible in agent.log, and report the
// failed run right here — don't wait for the deferred handler — so the
// backend records that this invocation contended for the lock while one
// was already in flight. reportFailedOnce is idempotent, so the deferred
// handler that also fires on the error return is a no-op.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants