Skip to content

Paths analytics query OOMs on multi-day windows for high-volume projects (uses 14-18 GiB of ClickHouse memory) #382

@parikshit223933

Description

@parikshit223933

Summary

The Paths analytics query (multi-CTE WITH session_paths AS (WITH paths_deduped_cte AS (WITH ordered_events AS ...)) aggregation) consumes 14–18 GiB of ClickHouse memory for an 8-day window on a project with ~1.5 billion screen_view events. With typical per-query memory limits, the query fails with Code 241 MEMORY_LIMIT_EXCEEDED and the dashboard surfaces it to users as Failed to execute 'json' on 'Response': Unexpected end of JSON input.

Setup

  • Self-hosted OpenPanel via the self-hosting/ directory (running lindesvard/openpanel-{api,dashboard,worker}:2)
  • ClickHouse 25.10.2.65
  • ~1.46 billion total events; ~1.5 billion screen_view events on the affected project
  • Resource limits in our deployment:
    • max_memory_usage = 16 GB (per query)
    • max_server_memory_usage_to_ram_ratio = 0.85 (24 GiB cgroup → 20.40 GiB server cap)
    • max_concurrent_select_queries = 80

The query (captured from system.query_log exception_code = 241)

WITH session_paths AS (
  WITH paths_deduped_cte AS (
    WITH ordered_events AS (
      SELECT session_id, concat(origin, path) as path, created_at
      FROM events
      WHERE project_id = '<redacted>'
        AND name = 'screen_view'
        AND path != '' AND path IS NOT NULL
        AND created_at BETWEEN toDateTime('2026-05-25 00:00:00')
                           AND toDateTime('2026-06-02 00:00:00')
      ORDER BY session_id ASC, created_at ASC
    )
    SELECT session_id,
      arraySlice(
        arrayFilter(
          (x, i) -> i = 1 OR x != paths_raw[i - 1],
          groupArray(path) AS paths_raw,
          arrayEnumerate(paths_raw)
        ),
        1, 5
      ) AS paths_deduped
    FROM ordered_events
    GROUP BY session_id
  )
  SELECT session_id, ..., paths[1] AS entry_page
  FROM paths_deduped_cte HAVING length(paths) >= 2
)
SELECT entry_page, count() AS count
FROM session_paths
GROUP BY entry_page
ORDER BY count DESC LIMIT 3 FORMAT JSON

Observed memory usage

From system.query_log ExceptionWhileProcessing events on 2026-06-01 with our 16 GB per-query cap (= 14.90 GiB binary):

event_time mem_used "would use"
13:15:03 13.68 GiB 17.46 GiB
13:14:55 14.16 GiB 16.47 GiB
13:14:48 14.46 GiB 18.46 GiB
13:14:32 14.69 GiB 18.47 GiB

The query needs ~17–18 GiB to complete on an 8-day window for our event volume.

Why this is hard to mitigate from the operator side

The query uses groupArray(path) to materialize all events per session into an array, then arrayFilter to dedupe consecutive duplicates. Both operators hold the per-session aggregation state in memory. With ~300M screen_view events in an 8-day window distributed across many sessions, the per-thread aggregation state grows large.

arrayFilter and groupArray don't spill to disk via max_bytes_before_external_group_by — that setting only applies to direct GROUP BY materialization. So even with disk-spill enabled (we have it at 11 GB), this query stays entirely in RAM.

This means there's no purely operator-side fix; raising the per-query cap past ~20 GiB starts to risk OOM against the cgroup, and raising the cgroup means giving CH a larger share of host RAM at the expense of other containers.

Suggestions (in increasing order of effort)

  1. Document the memory cost in self-hosting/ docs. Operators of large self-hosted instances should know Paths analytics is memory-heavy.
  2. UI date-range cap or warning on the Paths dashboard — suggest shorter ranges for high-volume projects, or warn before submitting if a heuristic projects high cost.
  3. Pre-aggregated session-paths materialized view at write time (similar to the existing cohort_events_mv, dau_mv, event_property_values_mv). The dashboard could read a small summary rather than recomputing from raw events on every load.
  4. Stream-aware rewrite — replace groupArray(path) + arrayFilter with a window function approach (CH 22+ has solid window function support) that can stream rather than buffer all paths per session.

Workaround we're using

  • Raised max_memory_usage to 16 GB so the median Paths query succeeds
  • Communicated to users that the Paths dashboard for high-volume projects should use 3–4 day ranges for now

Happy to provide additional data (full query text, EXPLAIN, system.query_log excerpts, EXPLAIN ESTIMATE) if useful.

Versions

  • lindesvard/openpanel-api:2, lindesvard/openpanel-dashboard:2, lindesvard/openpanel-worker:2 (Docker Hub :2 tag)
  • ClickHouse: clickhouse/clickhouse-server:25.10.2.65
  • Self-hosting layout: this repo's self-hosting/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions