Summary
The Paths analytics query (multi-CTE WITH session_paths AS (WITH paths_deduped_cte AS (WITH ordered_events AS ...)) aggregation) consumes 14–18 GiB of ClickHouse memory for an 8-day window on a project with ~1.5 billion screen_view events. With typical per-query memory limits, the query fails with Code 241 MEMORY_LIMIT_EXCEEDED and the dashboard surfaces it to users as Failed to execute 'json' on 'Response': Unexpected end of JSON input.
Setup
- Self-hosted OpenPanel via the
self-hosting/ directory (running lindesvard/openpanel-{api,dashboard,worker}:2)
- ClickHouse
25.10.2.65
- ~1.46 billion total events; ~1.5 billion
screen_view events on the affected project
- Resource limits in our deployment:
max_memory_usage = 16 GB (per query)
max_server_memory_usage_to_ram_ratio = 0.85 (24 GiB cgroup → 20.40 GiB server cap)
max_concurrent_select_queries = 80
The query (captured from system.query_log exception_code = 241)
WITH session_paths AS (
WITH paths_deduped_cte AS (
WITH ordered_events AS (
SELECT session_id, concat(origin, path) as path, created_at
FROM events
WHERE project_id = '<redacted>'
AND name = 'screen_view'
AND path != '' AND path IS NOT NULL
AND created_at BETWEEN toDateTime('2026-05-25 00:00:00')
AND toDateTime('2026-06-02 00:00:00')
ORDER BY session_id ASC, created_at ASC
)
SELECT session_id,
arraySlice(
arrayFilter(
(x, i) -> i = 1 OR x != paths_raw[i - 1],
groupArray(path) AS paths_raw,
arrayEnumerate(paths_raw)
),
1, 5
) AS paths_deduped
FROM ordered_events
GROUP BY session_id
)
SELECT session_id, ..., paths[1] AS entry_page
FROM paths_deduped_cte HAVING length(paths) >= 2
)
SELECT entry_page, count() AS count
FROM session_paths
GROUP BY entry_page
ORDER BY count DESC LIMIT 3 FORMAT JSON
Observed memory usage
From system.query_log ExceptionWhileProcessing events on 2026-06-01 with our 16 GB per-query cap (= 14.90 GiB binary):
| event_time |
mem_used |
"would use" |
| 13:15:03 |
13.68 GiB |
17.46 GiB |
| 13:14:55 |
14.16 GiB |
16.47 GiB |
| 13:14:48 |
14.46 GiB |
18.46 GiB |
| 13:14:32 |
14.69 GiB |
18.47 GiB |
The query needs ~17–18 GiB to complete on an 8-day window for our event volume.
Why this is hard to mitigate from the operator side
The query uses groupArray(path) to materialize all events per session into an array, then arrayFilter to dedupe consecutive duplicates. Both operators hold the per-session aggregation state in memory. With ~300M screen_view events in an 8-day window distributed across many sessions, the per-thread aggregation state grows large.
arrayFilter and groupArray don't spill to disk via max_bytes_before_external_group_by — that setting only applies to direct GROUP BY materialization. So even with disk-spill enabled (we have it at 11 GB), this query stays entirely in RAM.
This means there's no purely operator-side fix; raising the per-query cap past ~20 GiB starts to risk OOM against the cgroup, and raising the cgroup means giving CH a larger share of host RAM at the expense of other containers.
Suggestions (in increasing order of effort)
- Document the memory cost in
self-hosting/ docs. Operators of large self-hosted instances should know Paths analytics is memory-heavy.
- UI date-range cap or warning on the Paths dashboard — suggest shorter ranges for high-volume projects, or warn before submitting if a heuristic projects high cost.
- Pre-aggregated session-paths materialized view at write time (similar to the existing
cohort_events_mv, dau_mv, event_property_values_mv). The dashboard could read a small summary rather than recomputing from raw events on every load.
- Stream-aware rewrite — replace
groupArray(path) + arrayFilter with a window function approach (CH 22+ has solid window function support) that can stream rather than buffer all paths per session.
Workaround we're using
- Raised
max_memory_usage to 16 GB so the median Paths query succeeds
- Communicated to users that the Paths dashboard for high-volume projects should use 3–4 day ranges for now
Happy to provide additional data (full query text, EXPLAIN, system.query_log excerpts, EXPLAIN ESTIMATE) if useful.
Versions
lindesvard/openpanel-api:2, lindesvard/openpanel-dashboard:2, lindesvard/openpanel-worker:2 (Docker Hub :2 tag)
- ClickHouse:
clickhouse/clickhouse-server:25.10.2.65
- Self-hosting layout: this repo's
self-hosting/
Summary
The Paths analytics query (multi-CTE
WITH session_paths AS (WITH paths_deduped_cte AS (WITH ordered_events AS ...))aggregation) consumes 14–18 GiB of ClickHouse memory for an 8-day window on a project with ~1.5 billionscreen_viewevents. With typical per-query memory limits, the query fails withCode 241 MEMORY_LIMIT_EXCEEDEDand the dashboard surfaces it to users asFailed to execute 'json' on 'Response': Unexpected end of JSON input.Setup
self-hosting/directory (runninglindesvard/openpanel-{api,dashboard,worker}:2)25.10.2.65screen_viewevents on the affected projectmax_memory_usage = 16 GB(per query)max_server_memory_usage_to_ram_ratio = 0.85(24 GiB cgroup → 20.40 GiB server cap)max_concurrent_select_queries = 80The query (captured from
system.query_logexception_code = 241)Observed memory usage
From
system.query_logExceptionWhileProcessingevents on 2026-06-01 with our 16 GB per-query cap (= 14.90 GiB binary):The query needs ~17–18 GiB to complete on an 8-day window for our event volume.
Why this is hard to mitigate from the operator side
The query uses
groupArray(path)to materialize all events per session into an array, thenarrayFilterto dedupe consecutive duplicates. Both operators hold the per-session aggregation state in memory. With ~300Mscreen_viewevents in an 8-day window distributed across many sessions, the per-thread aggregation state grows large.arrayFilterandgroupArraydon't spill to disk viamax_bytes_before_external_group_by— that setting only applies to directGROUP BYmaterialization. So even with disk-spill enabled (we have it at 11 GB), this query stays entirely in RAM.This means there's no purely operator-side fix; raising the per-query cap past ~20 GiB starts to risk OOM against the cgroup, and raising the cgroup means giving CH a larger share of host RAM at the expense of other containers.
Suggestions (in increasing order of effort)
self-hosting/docs. Operators of large self-hosted instances should know Paths analytics is memory-heavy.cohort_events_mv,dau_mv,event_property_values_mv). The dashboard could read a small summary rather than recomputing from raw events on every load.groupArray(path)+arrayFilterwith a window function approach (CH 22+ has solid window function support) that can stream rather than buffer all paths per session.Workaround we're using
max_memory_usageto 16 GB so the median Paths query succeedsHappy to provide additional data (full query text,
EXPLAIN,system.query_logexcerpts,EXPLAIN ESTIMATE) if useful.Versions
lindesvard/openpanel-api:2,lindesvard/openpanel-dashboard:2,lindesvard/openpanel-worker:2(Docker Hub:2tag)clickhouse/clickhouse-server:25.10.2.65self-hosting/