`hl-visor` low-memory restart rewinds `--batch-by-block` output sequence on 64GB host (no cgroup limit)

  ## What happened

  At `2026-02-06T16:31:54Z` visor logged:

  - `child_low_memory: true`
  - `memory_usage: 0.9566818114790238`

  Then it restarted `hl-node`.

  After restart, node initialized from local ABCI state at block `885820000`, and output started again at `885820001` in the same hourly
  file.
  Our consumer was already at `885820213`, so it saw a rewind and hard-stopped.

  ## Why this is a problem

  For `--batch-by-block` outputs, we need a clean monotonic stream unless there’s an explicit restart boundary. Right now restart causes:

  - expected `885820214`
  - got `885820001`
  - same path pattern (`node_*_by_block/hourly/20260206/16`)

  So all by-block streams (trades/fills/book_diffs/order_statuses) can suddenly rewind.

  ## Logs (key lines)

  - Visor restart trigger:
    - `visor child in bad state, restarting ... child_low_memory: true ... memory_usage: 0.9566818114790238`
  - Node restart:
    - `initializing with local abci_state ... height: 885820000`
  - Downstream rewind:
    - `Block gap detected expected=885820214 got=885820001 direction="rewind"`

  ## Host memory context (not OOM-killed)

  - Host RAM: ~64GB (`MemTotal: 64776240 kB`)
  - `hl-node` RSS around incident: ~30GB
  - cgroup:
    - `memory.max = max`
    - `memory.high = max`
    - `oom=0`
    - `oom_kill=0`

  So this doesn’t look like kernel OOM kill; it looks like visor policy.

  ## Questions

  1. What exactly is `memory_usage` measuring for `child_low_memory`?
  2. What threshold triggers this restart?
  3. Can we tune/disable this threshold via config/flag/env?
  4. What are recommended settings for stable prod use with `--batch-by-block` writers?
  5. Can you emit an explicit restart epoch/marker for by-block output so consumers can safely re-sync?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`hl-visor` low-memory restart rewinds `--batch-by-block` output sequence on 64GB host (no cgroup limit) #136

What happened

Why this is a problem

Logs (key lines)

Host memory context (not OOM-killed)

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

hl-visor low-memory restart rewinds --batch-by-block output sequence on 64GB host (no cgroup limit) #136

Description

What happened

Why this is a problem

Logs (key lines)

Host memory context (not OOM-killed)

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`hl-visor` low-memory restart rewinds `--batch-by-block` output sequence on 64GB host (no cgroup limit) #136