Adjust logging to avoid expensive CosmosException.getMessage() diagnostics json serialization#49321
Adjust logging to avoid expensive CosmosException.getMessage() diagnostics json serialization#49321jaxpod70 wants to merge 2 commits into
Conversation
|
Thank you for your contribution @jaxpod70! We will review the pull request and get back to you soon. |
|
@microsoft-github-policy-service agree company="Microsoft" |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates change feed partition processor logging when handling CosmosException, aiming to reduce WARN log verbosity while still allowing full exception details at DEBUG.
Changes:
- Switched WARN logs to parameterized logging and appended
clientException.getShortMessage(). - Added guarded DEBUG logs that include the full exception/stack trace.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/changefeed/pkversion/PartitionProcessorImpl.java | Refines CosmosException logging in pk-version processor (WARN summary + DEBUG stack trace). |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/changefeed/epkversion/PartitionProcessorImpl.java | Refines CosmosException logging in epk-version processor (WARN summary + DEBUG stack trace). |
| logger.warn( | ||
| "CosmosException: partition {} from thread {} with owner {} {}", | ||
| this.lease.getLeaseToken(), | ||
| Thread.currentThread().getId(), | ||
| this.lease.getOwner(), | ||
| clientException.getShortMessage()); |
| logger.debug( | ||
| "CosmosException: partition" + this.lease.getLeaseToken() | ||
| + "from thread " + Thread.currentThread().getId() + " with owner " + this.lease.getOwner(), | ||
| clientException); |
| + " from thread " + Thread.currentThread().getId() + " with owner " + this.lease.getOwner(), | ||
| clientException); | ||
| logger.warn( | ||
| "CosmosException: partition {} from thread {} with owner {} {}", |
| "Lease with token {}: CosmosException was thrown from thread {} for lease with owner {} {}", | ||
| this.lease.getLeaseToken(), | ||
| Thread.currentThread().getId(), | ||
| this.lease.getOwner(), | ||
| clientException.getShortMessage()); |
| logger.debug( | ||
| "Lease with token " + this.lease.getLeaseToken() + ": CosmosException was thrown from thread " + | ||
| Thread.currentThread().getId() + " for lease with owner " + this.lease.getOwner(), | ||
| clientException); |
|
|
||
| CosmosException clientException = (CosmosException) throwable; | ||
| logger.warn( | ||
| "Lease with token " + this.lease.getLeaseToken() + ": CosmosException was thrown from thread " + |
There was a problem hiding this comment.
Using ShortMessage is fine for possibly noisy logs - see Exceptions.isCommonlyExpectedExceptionPossiblyCausingNoisyLogs for an example - but for unexpected errors the full diagnostics are crucial for being able to debug - so, please use the Exceptions.isCommonlyExpectedExceptionPossiblyCausingNoisyLogs to decide whether to use full or short message
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Please honor Exceptions.isCommonlyExpectedExceptionPossiblyCausingNoisyLogs
Description
Fixes excessive log line sizes in the ChangeFeedProcessor's partition processing loop caused by CosmosException.getMessage() serializing the full CosmosDiagnostics payload (request timelines, region contacts, retry history, metadata) into JSON when logged at WARN level.
Under heavy DB connection load, this produces multi-MB log lines (30+ MB observed) on every transient error (429 throttles, timeouts, connectivity issues) across all leased partitions concurrently. This causes log ingestion pipeline throttling/rejection, increased storage costs, and memory pressure from repeated serialization in a hot loop.
Fix: Split the existing log statement into two tiers in both epkversion.PartitionProcessorImpl and pkversion.PartitionProcessorImpl:
This preserves observability for operators at default log levels while eliminating the excessive payload.
Related Issue
49320
All SDK Contribution checklist:
General Guidelines and Best Practices (https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#developer-guide)
Testing Guidelines (https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#building-and-unit-testing)