Skip to content

Force flush conflicts with periodic background exporting #8433

@kropptrevor

Description

@kropptrevor

Describe the bug
I observed this with metrics, specifically.
When running a force flush alongside the periodic exporter, I get this error:

  • Level: DEBUG
  • Class: io.opentelemetry.sdk.metrics.export.PeriodicMetricReader
  • Message: Exporter busy. Dropping metrics.

Steps to reproduce
I am running force flush every 3 seconds which increases the probability of a collision.

What did you expect to see?
The force flush waiting until the exporter is available, possibly implemented by getting a lock with a timeout.
Or if it's desirable to let it fail, there needs to be some error type exposed to the force flush API to allow us to handle it.

What did you see instead?
It fails with no reason or exception. Debug logging is required to see the problem.

What version and what artifacts are you using?
Artifacts: opentelemetry-api, opentelemetry-sdk-extension-autoconfigure, opentelemetry-logback-appender-1.0, opentelemetry-exporter-otlp, opentelemetry-exporter-sender-jdk, opentelemetry-runtime-telemetry, opentelemetry-kafka-clients-2.6
Version: v1.62.0

How did you reference these artifacts? (excerpt from your build.gradle, pom.xml, etc)
build.gradle:

// snip
dependencies {
// snip
	implementation libs.otel.autoconfigure
	implementation libs.otel.logback
	api libs.otel.api
	implementation(libs.otel.exporter) {
		exclude group: 'io.opentelemetry', module: 'opentelemetry-exporter-sender-okhttp'
	}
	implementation libs.otel.exporter.sender
	implementation libs.otel.instrumentation.runtime
	implementation libs.otel.instrumentation.kafka.client
	testImplementation libs.otel.testing
// snip
}
// snip

libs.versions.toml:

[versions]
# snip
otel = "1.62.+"
otel-instrumentation = "[2.27.0-alpha,2.28.0-alpha)"
# snip
[libraries]
# snip
otel-logback = { module = "io.opentelemetry.instrumentation:opentelemetry-logback-appender-1.0", version.ref = "otel-instrumentation" }
otel-autoconfigure = { module = "io.opentelemetry:opentelemetry-sdk-extension-autoconfigure", version.ref = "otel" }
otel-api = { module = "io.opentelemetry:opentelemetry-api", version.ref = "otel" }
otel-exporter = { module = "io.opentelemetry:opentelemetry-exporter-otlp", version.ref = "otel" }
otel-exporter-sender = { module = "io.opentelemetry:opentelemetry-exporter-sender-jdk", version.ref = "otel" }
otel-instrumentation-runtime = { module = "io.opentelemetry.instrumentation:opentelemetry-runtime-telemetry", version.ref = "otel-instrumentation" }
otel-instrumentation-http-client = { module = "io.opentelemetry.instrumentation:opentelemetry-java-http-client", version.ref = "otel-instrumentation" }
otel-instrumentation-aws-sdk = { module = "io.opentelemetry.instrumentation:opentelemetry-aws-sdk-2.2", version.ref = "otel-instrumentation" }
otel-instrumentation-kafka-client = { module = "io.opentelemetry.instrumentation:opentelemetry-kafka-clients-2.6", version.ref = "otel-instrumentation" }
otel-testing = { module = "io.opentelemetry:opentelemetry-sdk-testing", version.ref = "otel" }
# snip

Environment
Compiler: gradle:9.3-jdk21-corretto
OS: gradle:9.3-jdk21-corretto
Runtime: AWS Lambda Java 21

Additional context
The lambda is just executing a health check, so it finishes very quickly.

I noticed in the code that it does a one-time check and if the flush collides even once with the periodic run, then it will error. This seems like it can happen at any time, even when not force flushing very often. That being said, with a health check every 12 seconds or so, it currently happens at a rate of 0.216% of the time (measured over 1 day).

Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions