Skip to content

feat: add hourly dune large-transfer monitor#212

Open
ctmotox2 wants to merge 2 commits into
yearn:mainfrom
ctmotox2:dune-hourly-transfer-monitor
Open

feat: add hourly dune large-transfer monitor#212
ctmotox2 wants to merge 2 commits into
yearn:mainfrom
ctmotox2:dune-hourly-transfer-monitor

Conversation

@ctmotox2
Copy link
Copy Markdown
Contributor

hourly query to Dune for USDai, iUSD, cUSD

@ctmotox2 ctmotox2 marked this pull request as ready for review May 5, 2026 12:06
Copy link
Copy Markdown
Collaborator

@spalen0 spalen0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Nice routing pattern — flagged a few issues below that I think need to be addressed before merge. The cache-corruption one is the main blocker since it would affect every other hourly monitor.

🔴 Blocking

1. Cache row-key contains :, corrupting the shared cache file

_row_key returns f"{tx_hash}|{block_time}|{contract}". Dune block_time columns are timestamps (e.g. 2026-05-12 14:30:00.000 UTC) — they contain colons. The cache file format in utils/cache.py:55 is parsed with key, value = line.strip().split(":") (no maxsplit). The first time this monitor writes a value, every subsequent read of cache-id.txt — across all hourly monitors sharing this file — will raise ValueError: too many values to unpack.

Suggested fix: drop block_time from the key (tx_hash is usually enough to dedup a transfer; add log_index if you need to disambiguate multiple transfers in one tx), or strip : from the stored value. Don't change utils/cache.py itself — other monitors rely on the current format.

2. Dedup only compares the top row — older rows get re-alerted

if newest_key == last_key:
    return

If the top row changes but rows #2#10 are unchanged from the previous run, all of them get re-alerted. Walk alert_rows until you hit last_key and only alert on the new prefix.

3. Ordering assumption is implicit

alert_rows[0] is assumed to be the newest, but nothing in the Python code sorts, and the Dune query's ORDER BY isn't pinned anywhere code-side. If someone edits the query and drops the ordering, dedup silently breaks. Either sort defensively by block_time desc, or at minimum add a comment in the module docstring stating the query MUST ORDER BY block_time DESC.

🟡 Design / robustness

  • Threshold fallback is unit-mixed. _is_large_transfer uses amount_usd when > 0 else falls back to raw amount. A 5M-token transfer of a token where amount_usd is missing or 0 (e.g. price feed failure Dune-side) would pass the $5M threshold even if actual USD value is $50. Safer: skip the row when amount_usd is missing/0 rather than falling back to a unitless amount.
  • DUNE_LARGE_TRANSFER_THRESHOLD is read but undocumented in .env.example. Add a commented-out line so operators know the knob exists.
  • stables protocol fallback. Unrouted (chain, addr) pairs alert to protocol \"stables\", which requires TELEGRAM_BOT_TOKEN_STABLES / TELEGRAM_CHAT_ID_STABLES — neither documented nor configured today. If the Dune query only ever returns the three routed tokens, the fallback is dead code; otherwise it's a silent send-failure path. Either document the env vars or drop the fallback and skip unrouted rows.
  • MAX_ROWS_IN_ALERT = 10 is per-protocol-group, not global. With 3 tokens in scope, you can get 3 separate alerts of up to 10 rows each. Probably fine — just a naming clarification.

🟢 Style / minor

  • LOGGER (uppercase) is inconsistent with the rest of the codebase (logger). See stables/main.py:13 and utils/logging.py convention.
  • No unit tests added for _route_for_row / _row_key / _is_large_transfer. These are pure functions and would be cheap to cover in tests/.

✅ Nice

  • dune-client already pinned in pyproject.toml.
  • DUNE_API_KEY correctly wired as a secret, query ID as a var.
  • Per-protocol Telegram routing matches the existing stables/main.py pattern.

Copy link
Copy Markdown
Collaborator

@spalen0 spalen0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after 34eb9f9 — all 9 items from the prior review are resolved (cache-key colon, top-row-only dedup, implicit ordering, unit-mixed threshold, undocumented env var, stables fallback, naming, LOGGER casing, no tests). Fix is minimal and surgical, and the new test file pins the cache-key fix nicely.

A few things still worth a look before merge — none are blockers on their own, but the first one is a silent failure mode:

🟡 Cap-then-cache silently drops rows on bursty hours

At stables/dune_large_transfers.py:197 the cache is written as _row_key(alert_rows[0]) (newest across all protocols), but each protocol group is truncated to MAX_ROWS_PER_PROTOCOL_ALERT = 10 at line 187. If a single protocol gets, say, 15 routed transfers above threshold in one hour, rows #11–15 are dropped from the alert AND the cache marker is advanced past them, so they're never alerted on the next run. Low probability, silent failure. Two safe fixes:

  • (a) cache the oldest-included row's key per protocol instead of the global newest, or
  • (b) append a +N more not shown — see Dune query directly tail line when truncating, so an operator knows to look.

(b) is a one-liner and probably sufficient.

🟡 _row_key won't actually use log_index for the current Dune query

The expected-columns list in the module docstring (lines 4–13) doesn't include log_index, so in practice the key is just tx_hash|contract. Two Transfer events of the same token in the same tx (router splits, multi-leg flows) would collide — only one alerts. The test test_row_key_uses_log_index_when_present exercises a branch the real query never hits. Either add log_index to the Dune SQL (preferred — it's the canonical disambiguator) or extend the docstring to require it.

🟢 Defensive sort uses lexicographic string compare

_sort_rows_newest_first sorts by _as_str(row.get(\"block_time\")). Works for the 2026-05-12 14:30:00.000 UTC format Dune currently emits, but silently breaks if the column ever switches to epoch int or ISO-with-T. Worth a one-line comment documenting the expected format.

🟢 from / to listed in docstring but never used

The docstring lists from and to as expected columns, but _build_row_line never references them. They're the most informative fields in a large-transfer alert — consider including them in the message (or drop from the docstring).

Approving in spirit pending the cap-then-cache fix; everything else is nits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants