Skip to content

Adjust logging to avoid expensive CosmosException.getMessage() diagnostics json serialization#49321

Open
jaxpod70 wants to merge 2 commits into
Azure:mainfrom
jaxpod70:jmck/logging-0
Open

Adjust logging to avoid expensive CosmosException.getMessage() diagnostics json serialization#49321
jaxpod70 wants to merge 2 commits into
Azure:mainfrom
jaxpod70:jmck/logging-0

Conversation

@jaxpod70
Copy link
Copy Markdown

Description

Fixes excessive log line sizes in the ChangeFeedProcessor's partition processing loop caused by CosmosException.getMessage() serializing the full CosmosDiagnostics payload (request timelines, region contacts, retry history, metadata) into JSON when logged at WARN level.

Under heavy DB connection load, this produces multi-MB log lines (30+ MB observed) on every transient error (429 throttles, timeouts, connectivity issues) across all leased partitions concurrently. This causes log ingestion pipeline throttling/rejection, increased storage costs, and memory pressure from repeated serialization in a hot loop.

Fix: Split the existing log statement into two tiers in both epkversion.PartitionProcessorImpl and pkversion.PartitionProcessorImpl:

  • WARN level: Uses CosmosException.getShortMessage() via parameterized logging — lightweight, no diagnostics payload
  • DEBUG level: Logs the full exception (with diagnostics + stack trace) for deep troubleshooting when needed

This preserves observability for operators at default log levels while eliminating the excessive payload.

Related Issue

49320

All SDK Contribution checklist:

General Guidelines and Best Practices (https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#developer-guide)

Testing Guidelines (https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#building-and-unit-testing)

  • Pull request includes test coverage for the included changes.

@github-actions github-actions Bot added Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. labels May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution @jaxpod70! We will review the pull request and get back to you soon.

@jaxpod70
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="Microsoft"

@jaxpod70 jaxpod70 marked this pull request as ready for review June 1, 2026 13:39
@jaxpod70 jaxpod70 requested review from a team and kirankumarkolli as code owners June 1, 2026 13:39
Copilot AI review requested due to automatic review settings June 1, 2026 13:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates change feed partition processor logging when handling CosmosException, aiming to reduce WARN log verbosity while still allowing full exception details at DEBUG.

Changes:

  • Switched WARN logs to parameterized logging and appended clientException.getShortMessage().
  • Added guarded DEBUG logs that include the full exception/stack trace.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/changefeed/pkversion/PartitionProcessorImpl.java Refines CosmosException logging in pk-version processor (WARN summary + DEBUG stack trace).
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/changefeed/epkversion/PartitionProcessorImpl.java Refines CosmosException logging in epk-version processor (WARN summary + DEBUG stack trace).

Comment on lines +211 to +216
logger.warn(
"CosmosException: partition {} from thread {} with owner {} {}",
this.lease.getLeaseToken(),
Thread.currentThread().getId(),
this.lease.getOwner(),
clientException.getShortMessage());
Comment on lines +218 to +221
logger.debug(
"CosmosException: partition" + this.lease.getLeaseToken()
+ "from thread " + Thread.currentThread().getId() + " with owner " + this.lease.getOwner(),
clientException);
+ " from thread " + Thread.currentThread().getId() + " with owner " + this.lease.getOwner(),
clientException);
logger.warn(
"CosmosException: partition {} from thread {} with owner {} {}",
Comment on lines +201 to +205
"Lease with token {}: CosmosException was thrown from thread {} for lease with owner {} {}",
this.lease.getLeaseToken(),
Thread.currentThread().getId(),
this.lease.getOwner(),
clientException.getShortMessage());
Comment on lines +207 to +210
logger.debug(
"Lease with token " + this.lease.getLeaseToken() + ": CosmosException was thrown from thread " +
Thread.currentThread().getId() + " for lease with owner " + this.lease.getOwner(),
clientException);

CosmosException clientException = (CosmosException) throwable;
logger.warn(
"Lease with token " + this.lease.getLeaseToken() + ": CosmosException was thrown from thread " +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ShortMessage is fine for possibly noisy logs - see Exceptions.isCommonlyExpectedExceptionPossiblyCausingNoisyLogs for an example - but for unexpected errors the full diagnostics are crucial for being able to debug - so, please use the Exceptions.isCommonlyExpectedExceptionPossiblyCausingNoisyLogs to decide whether to use full or short message

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please honor Exceptions.isCommonlyExpectedExceptionPossiblyCausingNoisyLogs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants