What happened
At 2026-02-06T16:31:54Z visor logged:
child_low_memory: true
memory_usage: 0.9566818114790238
Then it restarted hl-node.
After restart, node initialized from local ABCI state at block 885820000, and output started again at 885820001 in the same hourly
file.
Our consumer was already at 885820213, so it saw a rewind and hard-stopped.
Why this is a problem
For --batch-by-block outputs, we need a clean monotonic stream unless there’s an explicit restart boundary. Right now restart causes:
- expected
885820214
- got
885820001
- same path pattern (
node_*_by_block/hourly/20260206/16)
So all by-block streams (trades/fills/book_diffs/order_statuses) can suddenly rewind.
Logs (key lines)
- Visor restart trigger:
visor child in bad state, restarting ... child_low_memory: true ... memory_usage: 0.9566818114790238
- Node restart:
initializing with local abci_state ... height: 885820000
- Downstream rewind:
Block gap detected expected=885820214 got=885820001 direction="rewind"
Host memory context (not OOM-killed)
- Host RAM: ~64GB (
MemTotal: 64776240 kB)
hl-node RSS around incident: ~30GB
- cgroup:
memory.max = max
memory.high = max
oom=0
oom_kill=0
So this doesn’t look like kernel OOM kill; it looks like visor policy.
Questions
- What exactly is
memory_usage measuring for child_low_memory?
- What threshold triggers this restart?
- Can we tune/disable this threshold via config/flag/env?
- What are recommended settings for stable prod use with
--batch-by-block writers?
- Can you emit an explicit restart epoch/marker for by-block output so consumers can safely re-sync?
What happened
At
2026-02-06T16:31:54Zvisor logged:child_low_memory: truememory_usage: 0.9566818114790238Then it restarted
hl-node.After restart, node initialized from local ABCI state at block
885820000, and output started again at885820001in the same hourlyfile.
Our consumer was already at
885820213, so it saw a rewind and hard-stopped.Why this is a problem
For
--batch-by-blockoutputs, we need a clean monotonic stream unless there’s an explicit restart boundary. Right now restart causes:885820214885820001node_*_by_block/hourly/20260206/16)So all by-block streams (trades/fills/book_diffs/order_statuses) can suddenly rewind.
Logs (key lines)
visor child in bad state, restarting ... child_low_memory: true ... memory_usage: 0.9566818114790238initializing with local abci_state ... height: 885820000Block gap detected expected=885820214 got=885820001 direction="rewind"Host memory context (not OOM-killed)
MemTotal: 64776240 kB)hl-nodeRSS around incident: ~30GBmemory.max = maxmemory.high = maxoom=0oom_kill=0So this doesn’t look like kernel OOM kill; it looks like visor policy.
Questions
memory_usagemeasuring forchild_low_memory?--batch-by-blockwriters?