Skip to content

V8 heap metrics: Track instance memory usage for procedure workers too#5122

Merged
gefjon merged 5 commits into
masterfrom
phoebe/metrics-procedure-heap-usage
May 27, 2026
Merged

V8 heap metrics: Track instance memory usage for procedure workers too#5122
gefjon merged 5 commits into
masterfrom
phoebe/metrics-procedure-heap-usage

Conversation

@gefjon
Copy link
Copy Markdown
Contributor

@gefjon gefjon commented May 26, 2026

Description of Changes

Prior to this PR, the V8HeapMetrics were tracked only for the "main" instance of a database, i.e. the reducer worker. This meant that we had little to no visibility into memory usage by procedures.

In this PR, we also track values for the procedure workers. We considered tracking each instance's usage separately with a unique integer instance_id label, but were concerned about cardinality (see discussion), so decided instead to track only two sets of label values per database: JsWorkerKind::Main and JsWorkerKind::Procedure. The entries for JsWorkerKind::Procedure store the sum of the values for all procedure workers for that database.

I also moved the logic for calling remove_label_values into an associated function on V8HeapMetrics, rather than listing them all in remove_database_gauges. This hides the fact that we have label values for both JsWorkerKind variants.

API and ABI breaking changes

We don't use any of these metrics for billing, and otherwise do not consider our metrics a stable API.

Expected complexity level and risk

2: it would be unfortunate if we reported incorrect values for these metrics, though (as mentioned above) they are not used for billing, only diagnostics.

Testing

I do not know how to test metrics.

gefjon added 2 commits May 26, 2026 13:08
Prior to this commit, the `V8HeapMetrics` were tracked
only for the "main" instance of a database, i.e. the reducer worker.
This meant that we had little to no visibility into memory usage by procedures.

In this commit, we add a label `instance_id` to all of those metrics,
a `u64` ID unique (scoped to the database) to the V8 instance.
The ID is set to 0 for the main worker,
and drawn from an `AtomicU64` counter starting at 1 for the procedure workers.

As part of this change, I've made it so that `V8HeapMetrics::drop` does `remove_label_values`,
and `V8HeapMetrics::observe` uses `set` rather than inc/dec by a delta.
The previous code appears to have been in an odd middle ground,
where the metrics were used only for the main worker,
but were treated in `observe` and `drop`
as if they might be a concurrent aggregation of multiple workers' values.
Relatedly, `remove_database_gauges` (in `crates/core/src/host/host_controller.rs`)
no longer needs to clean up the database gauges.
It couldn't if it wanted to, either, 'cause it won't know the set of instance IDs to remove.
Explicitly ignore `Result` from `remove_label_values`,
delete some dead code.
Comment thread crates/core/src/host/v8/mod.rs
gefjon added 2 commits May 26, 2026 18:01
Review flagged cardinality of these metrics as a concern,
as even when we properly clean up unused entries with `remove_label_values`,
all values remain live in the remote Prometheus database.
As such, this commit reverts part of the PR's previous changes,
so that all of the procedure workers for a given database
share a single set of label values,
and incrementally update the measurements in that single set of metric entries.
The phrase "a database's tracked JS worker kind" no longer makes sense,
as all possible JS worker kinds are tracked.
@gefjon
Copy link
Copy Markdown
Contributor Author

gefjon commented May 26, 2026

Updated PR description and title: I'll make the changes for Wasmtime modules in a separate PR.

@gefjon gefjon marked this pull request as ready for review May 26, 2026 22:14
@gefjon gefjon requested a review from joshua-spacetime May 26, 2026 22:14
@gefjon gefjon changed the title Track instance memory usage metrics per-instance, not just for the "main" instance V8 heap metrics: Track instance memory usage for procedure workers too May 26, 2026
@gefjon gefjon enabled auto-merge May 26, 2026 22:16
@gefjon gefjon added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 26, 2026
@gefjon gefjon added this pull request to the merge queue May 27, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks May 27, 2026
@gefjon gefjon added this pull request to the merge queue May 27, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 27, 2026
@gefjon gefjon added this pull request to the merge queue May 27, 2026
Merged via the queue into master with commit dd4ac18 May 27, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants