Fix gRPC telemetry exporters#6558
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 973e089ed3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) -> anyhow::Result<(OpenTelemetryRecorder, SdkMeterProvider)> { | ||
| let metrics_protocol = otlp_config.metrics_protocol()?; | ||
| let metric_exporter = metrics_protocol.metric_exporter()?; | ||
| let metric_reader = PeriodicReader::builder(metric_exporter, runtime::Tokio).build(); |
There was a problem hiding this comment.
Keep HTTP OTLP exports off the Tokio runtime
When OTEL_EXPORTER_OTLP_PROTOCOL or a signal-specific protocol is set to http/protobuf or http/json (both are accepted in config.rs), this crate still uses opentelemetry-otlp's default blocking reqwest HTTP client because quickwit/Cargo.toml:192 does not enable reqwest-client or hyper-client. Moving this reader (and the analogous log/span processors) onto runtime::Tokio therefore runs blocking HTTP exports on Quickwit's main Tokio workers instead of the SDK dedicated thread; a slow or retrying collector can tie up runtime threads and degrade serving/indexing. Please keep the async-runtime path to gRPC or enable an async HTTP client for HTTP protocols.
Useful? React with 👍 / 👎.
| let traces_protocol = otlp_config.traces_protocol()?; | ||
| let span_exporter = traces_protocol.span_exporter()?; | ||
| let span_processor = trace::BatchSpanProcessor::builder(span_exporter) | ||
| let span_processor = BatchSpanProcessor::builder(span_exporter, runtime::Tokio) |
There was a problem hiding this comment.
Avoid deadlocking shutdown on one-worker runtimes
With the processor spawned onto runtime::Tokio, provider shutdown now has to send a shutdown message to a Tokio task and synchronously wait for the reply. Quickwit calls telemetry_handle.shutdown() from inside main_impl, and QW_TOKIO_RUNTIME_NUM_THREADS (or small CPU limits) can make the multi-thread runtime use a single worker; in that case this blocking shutdown occupies the only worker, so the telemetry task cannot run to flush and acknowledge shutdown. The previous dedicated-thread processor did not depend on the application runtime making progress during shutdown, so please either avoid the async processor for single-worker runtimes or run shutdown outside the Tokio worker.
Useful? React with 👍 / 👎.
Default gRPC processors are spawned in a dedicated OS thread without handle on the tokio runtime. Move to experimental async processors. Tested locally.