Describe the bug
I observed this with metrics, specifically.
When running a force flush alongside the periodic exporter, I get this error:
- Level: DEBUG
- Class: io.opentelemetry.sdk.metrics.export.PeriodicMetricReader
- Message: Exporter busy. Dropping metrics.
Steps to reproduce
I am running force flush every 3 seconds which increases the probability of a collision.
What did you expect to see?
The force flush waiting until the exporter is available, possibly implemented by getting a lock with a timeout.
Or if it's desirable to let it fail, there needs to be some error type exposed to the force flush API to allow us to handle it.
What did you see instead?
It fails with no reason or exception. Debug logging is required to see the problem.
What version and what artifacts are you using?
Artifacts: opentelemetry-api, opentelemetry-sdk-extension-autoconfigure, opentelemetry-logback-appender-1.0, opentelemetry-exporter-otlp, opentelemetry-exporter-sender-jdk, opentelemetry-runtime-telemetry, opentelemetry-kafka-clients-2.6
Version: v1.62.0
How did you reference these artifacts? (excerpt from your build.gradle, pom.xml, etc)
build.gradle:
// snip
dependencies {
// snip
implementation libs.otel.autoconfigure
implementation libs.otel.logback
api libs.otel.api
implementation(libs.otel.exporter) {
exclude group: 'io.opentelemetry', module: 'opentelemetry-exporter-sender-okhttp'
}
implementation libs.otel.exporter.sender
implementation libs.otel.instrumentation.runtime
implementation libs.otel.instrumentation.kafka.client
testImplementation libs.otel.testing
// snip
}
// snip
libs.versions.toml:
[versions]
# snip
otel = "1.62.+"
otel-instrumentation = "[2.27.0-alpha,2.28.0-alpha)"
# snip
[libraries]
# snip
otel-logback = { module = "io.opentelemetry.instrumentation:opentelemetry-logback-appender-1.0", version.ref = "otel-instrumentation" }
otel-autoconfigure = { module = "io.opentelemetry:opentelemetry-sdk-extension-autoconfigure", version.ref = "otel" }
otel-api = { module = "io.opentelemetry:opentelemetry-api", version.ref = "otel" }
otel-exporter = { module = "io.opentelemetry:opentelemetry-exporter-otlp", version.ref = "otel" }
otel-exporter-sender = { module = "io.opentelemetry:opentelemetry-exporter-sender-jdk", version.ref = "otel" }
otel-instrumentation-runtime = { module = "io.opentelemetry.instrumentation:opentelemetry-runtime-telemetry", version.ref = "otel-instrumentation" }
otel-instrumentation-http-client = { module = "io.opentelemetry.instrumentation:opentelemetry-java-http-client", version.ref = "otel-instrumentation" }
otel-instrumentation-aws-sdk = { module = "io.opentelemetry.instrumentation:opentelemetry-aws-sdk-2.2", version.ref = "otel-instrumentation" }
otel-instrumentation-kafka-client = { module = "io.opentelemetry.instrumentation:opentelemetry-kafka-clients-2.6", version.ref = "otel-instrumentation" }
otel-testing = { module = "io.opentelemetry:opentelemetry-sdk-testing", version.ref = "otel" }
# snip
Environment
Compiler: gradle:9.3-jdk21-corretto
OS: gradle:9.3-jdk21-corretto
Runtime: AWS Lambda Java 21
Additional context
The lambda is just executing a health check, so it finishes very quickly.
I noticed in the code that it does a one-time check and if the flush collides even once with the periodic run, then it will error. This seems like it can happen at any time, even when not force flushing very often. That being said, with a health check every 12 seconds or so, it currently happens at a rate of 0.216% of the time (measured over 1 day).
Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Describe the bug
I observed this with metrics, specifically.
When running a force flush alongside the periodic exporter, I get this error:
Steps to reproduce
I am running force flush every 3 seconds which increases the probability of a collision.
What did you expect to see?
The force flush waiting until the exporter is available, possibly implemented by getting a lock with a timeout.
Or if it's desirable to let it fail, there needs to be some error type exposed to the force flush API to allow us to handle it.
What did you see instead?
It fails with no reason or exception. Debug logging is required to see the problem.
What version and what artifacts are you using?
Artifacts:
opentelemetry-api,opentelemetry-sdk-extension-autoconfigure,opentelemetry-logback-appender-1.0,opentelemetry-exporter-otlp,opentelemetry-exporter-sender-jdk,opentelemetry-runtime-telemetry,opentelemetry-kafka-clients-2.6Version:
v1.62.0How did you reference these artifacts? (excerpt from your
build.gradle,pom.xml, etc)build.gradle:
libs.versions.toml:
Environment
Compiler: gradle:9.3-jdk21-corretto
OS: gradle:9.3-jdk21-corretto
Runtime: AWS Lambda Java 21
Additional context
The lambda is just executing a health check, so it finishes very quickly.
I noticed in the code that it does a one-time check and if the flush collides even once with the periodic run, then it will error. This seems like it can happen at any time, even when not force flushing very often. That being said, with a health check every 12 seconds or so, it currently happens at a rate of 0.216% of the time (measured over 1 day).
Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.