V8 heap metrics: Track instance memory usage for procedure workers too#5122
Merged
Conversation
Prior to this commit, the `V8HeapMetrics` were tracked only for the "main" instance of a database, i.e. the reducer worker. This meant that we had little to no visibility into memory usage by procedures. In this commit, we add a label `instance_id` to all of those metrics, a `u64` ID unique (scoped to the database) to the V8 instance. The ID is set to 0 for the main worker, and drawn from an `AtomicU64` counter starting at 1 for the procedure workers. As part of this change, I've made it so that `V8HeapMetrics::drop` does `remove_label_values`, and `V8HeapMetrics::observe` uses `set` rather than inc/dec by a delta. The previous code appears to have been in an odd middle ground, where the metrics were used only for the main worker, but were treated in `observe` and `drop` as if they might be a concurrent aggregation of multiple workers' values. Relatedly, `remove_database_gauges` (in `crates/core/src/host/host_controller.rs`) no longer needs to clean up the database gauges. It couldn't if it wanted to, either, 'cause it won't know the set of instance IDs to remove.
Review flagged cardinality of these metrics as a concern, as even when we properly clean up unused entries with `remove_label_values`, all values remain live in the remote Prometheus database. As such, this commit reverts part of the PR's previous changes, so that all of the procedure workers for a given database share a single set of label values, and incrementally update the measurements in that single set of metric entries.
The phrase "a database's tracked JS worker kind" no longer makes sense, as all possible JS worker kinds are tracked.
Contributor
Author
|
Updated PR description and title: I'll make the changes for Wasmtime modules in a separate PR. |
joshua-spacetime
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of Changes
Prior to this PR, the
V8HeapMetricswere tracked only for the "main" instance of a database, i.e. the reducer worker. This meant that we had little to no visibility into memory usage by procedures.In this PR, we also track values for the procedure workers. We considered tracking each instance's usage separately with a unique integer
instance_idlabel, but were concerned about cardinality (see discussion), so decided instead to track only two sets of label values per database:JsWorkerKind::MainandJsWorkerKind::Procedure. The entries forJsWorkerKind::Procedurestore the sum of the values for all procedure workers for that database.I also moved the logic for calling
remove_label_valuesinto an associated function onV8HeapMetrics, rather than listing them all inremove_database_gauges. This hides the fact that we have label values for bothJsWorkerKindvariants.API and ABI breaking changes
We don't use any of these metrics for billing, and otherwise do not consider our metrics a stable API.
Expected complexity level and risk
2: it would be unfortunate if we reported incorrect values for these metrics, though (as mentioned above) they are not used for billing, only diagnostics.
Testing
I do not know how to test metrics.